Skip to content

updating env files and cleaning saved images#96

Merged
royshadmon merged 1 commit into
pre-mainfrom
xray-data-handler
Jun 9, 2025
Merged

updating env files and cleaning saved images#96
royshadmon merged 1 commit into
pre-mainfrom
xray-data-handler

Conversation

@royshadmon
Copy link
Copy Markdown
Owner

No description provided.

@royshadmon royshadmon merged commit 3bce8c2 into pre-main Jun 9, 2025
royshadmon added a commit that referenced this pull request Jun 9, 2025
* Marked where+which IBM imports are used

* Marked all IBM DataHandler and ModelUpdate instances

* Removed need for IBM DataHandler for Winniio model

* Transitioned into own version of ModelUpdate, removing its IBM dependency (manually for now requires changing ModelUpdate to LocalModelUpdate in lib/python3.9/site-packages/ibmfl/aggregator/fusion/iter_avg_fusion_handler.py)

* [Forgot to push local_model_update.py] Transitioned into own version of ModelUpdate, removing its IBM dependency (manually for now requires changing ModelUpdate to LocalModelUpdate in lib/python3.9/site-packages/ibmfl/aggregator/fusion/iter_avg_fusion_handler.py)

* Updated IBM's Fusion Model to support our version of ModelUpdate (removes the need for manually changes)

* Created a base aggregation model class + federated average agg. model to replace IBM's IterAvgFusionHandler

* Removed temp ibm_fusion_handler.py

* Removed temp ibm_fusion_handler.py

* MNist works with Keras instead of PyTorch, removed IBM dependency

* Removed IBM federated learning, upgraded platform to Python 3.11/requirements, merged w/updated README

* Revert "Removed IBM federated learning & upgraded platform to Python 3.11"

* Revert "Revert "Removed IBM federated learning & upgraded platform to Python 3.11""

* Upgraded platform to run on Python 3.12 + requirements; updated README

* Merged Python 3.12 w/Miguel's FastAPI implementation

* Merged Python 3.12 w/Miguel's FastAPI implementation

* Fixing FastAPI naming error

* Adding ability of logger to env's; added decision tree PDF

* Update winniio.env to have debugger support

* Create mnist.env example

* Update .gitignore to allow mnist.env

* Update .gitignore to allow both .env to be unwritten

* Fixed pathing for logs and winniio bug fix

* Fixed errors for release 1; formatting

* final release 1 updates. refactoring directory names.  everything works on both mnist and winniio dataset

* adding env files back to main

* cleaning codebase

* Implemented index key into initialization + README; added missing torchvision requirement; fixed typos

* added `blockchain get index` ability; added node_type to policy definitions; tested w/`blockchain get [query]`

* Added db validation check when a node starts up; adjusted logger for node_server / node

* Converted Flask to FastAPI for continuous training code (originally written in Flask by Chahel); fixed merge conflicts

* Update README.md for continue-training command

* some docker stuff done, working on make commands

* Fixed file write paths to include/sorted by index

* Fixed round 1 bug for continuous training; added {index}-r to the blockchain to hold most recent aggregated model file

* created a new branch from pre-main where im working on creating the docker images and containerizing each node type

* created a new branch from pre-main where im working on creating the docker images and containerizing each node type

* updating README and small code cleaning to PR

* Adjust API calls to support dynamic nodes; minParams also dynamically adjusted for newly added nodes in middle of training

* Adjusted minParams to prevent stalling when it is greater than the number of active nodes

* Update README.md for dynamic nodes

* Update README.md for pathing

* working on letting app2.py work with platform_cmponents

* Enabled threading for node initialization (originally from Nikolas' code); fixed bugs and loggers

* Added new endpoint to update minParams (available after initialization); removed dynamic minParams when new nodes are added during training; updated README w/new endpoint

* Checks if index is unique; modularized initialization portion of aggregator and nodes

* app2.py correctly starts aggregator server

* updating indexing to write files to a directory named the index, not filename including the index name. also adding index parameter to the continue-traiing functionality. changes also require updates to env files, so please take note

* updating README with the continue-training rest call

* removing unused variable from env files

* Added index component to `/start-training`; updated README

* when adding a new node to the training process, it starts at the most recent round

* removing commented code

* Adding to-do's; bug fixes

* pc work from the previous meeting 5/1

* Added aggregator and node modularity (except for different data handlers); implemented simultaneous training processes on the same servers but only works if the same model is training on all processes; indicated which model is running during progression (minus the progress bar); ensured training is multithreaded and dynamic; index is now required for all current endpoints; fixed README

* Update README.md for module path

Will be changed later for simplicity of user

* testing docker build on pc

* 5/5/25 starting on make fucntionality, still having some  issues with envs and running the container with correct files

* Tested multiple (different) data handlers running concurrently; fixed README; passed db through initialization command; moved DB check in node_server from lifespan() to initialization --> marking the end of testing for training of multiple models

* Adjusted some logger messages

* Converted module_path to module_file to make it easier on user (appended file name to end of path found in .env)

* fixing directories on default .env files

* 5/6 work, docker build runs aggregator correctly, need to access from host browser

* 5-8 merging pre-main into containerization

* Update README.md (forgot commas on command examples)

* aggregator and nodes deployed as docker containers using a single docker compose file, hanging on training

* updating env files

* Revert "Converted module_path to module_file to make it easier on user (appended file name to end of path found in .env)"

This reverts commit 39a335a.

* Reworked "converting module_path to module_path";passed db_name; fixed README

* Update docker-compose.yaml

* updated readme to match pre-main

* updated readme to include containerization

* updated readme to include containerization

* Added message at the end of training process; checked that module_path existed; checked if node actually initialized; started aggregator direct inference; some code cleaning

* Added direct_inference to aggregator (takes in list of test data and list of labels/predictions); enabled aggregator to have an fl_model (in both data handlers); updated direct_inference in the mnist data handler (will update for winniio later)

* Updated agg. direct_inference for winniio; direct inference input now allows list of elements of any type (conversions and validations are done within the data_handler); updated README

* updated README w/WINNIIO direct_inference example

* added logs to node_server.py, updated docxkers to include the cuda path

* Commented plan for updating listen_for_update_agg()

* Updated aggregator to preemptively pull node model links in listen_for_update_agg() (new function for reading/fetching node model links)

* replaced absolute path in docker-compose with a relative path (no longer needs to be updated), removed cuda usage in the containerized apis

* removed unused app.py, renamed app2.py to app.py

* pulled node_server.py from pre-main

* Update README.md

* updated README to include running specific apis only

* updated README to include taking down all the apis

* fix for continue-training that kee[s end_round state

* updating gitignore to prevent default evnfiles in both EdgeLake and edgefl from being overwritten

* Added db script and .env files for chest xray bbox model

* Added chest xray bbox data handler (not yet tested) and updated .env's + requirements.txt

* Renamed bbox data handler for consistency

* Tested training of BBox data handler (need to reduce the agg/model file sizes because nodes take too long copying; also still need to test inference); updated README w/Kaggle setup

* Reduced model size of the chest xrays bbox data handler to fix the slow copying (invalid load key bug); inference tested and working

* updating env files and cleaning saved images (#96)

---------

Co-authored-by: DDublue <theonly1living@gmail.com>
Co-authored-by: royshadmon <16313057+royshadmon@users.noreply.github.com>
Co-authored-by: Miguel61823 <mmascare@ucsc.edu>
Co-authored-by: David Wu <122853894+DDublue@users.noreply.github.com>
Co-authored-by: Miguel61823 <146488686+Miguel61823@users.noreply.github.com>
royshadmon added a commit that referenced this pull request Jun 12, 2025
* Pre-Main to Main push for XRay Data Handler (#95)

* Marked where+which IBM imports are used

* Marked all IBM DataHandler and ModelUpdate instances

* Removed need for IBM DataHandler for Winniio model

* Transitioned into own version of ModelUpdate, removing its IBM dependency (manually for now requires changing ModelUpdate to LocalModelUpdate in lib/python3.9/site-packages/ibmfl/aggregator/fusion/iter_avg_fusion_handler.py)

* [Forgot to push local_model_update.py] Transitioned into own version of ModelUpdate, removing its IBM dependency (manually for now requires changing ModelUpdate to LocalModelUpdate in lib/python3.9/site-packages/ibmfl/aggregator/fusion/iter_avg_fusion_handler.py)

* Updated IBM's Fusion Model to support our version of ModelUpdate (removes the need for manually changes)

* Created a base aggregation model class + federated average agg. model to replace IBM's IterAvgFusionHandler

* Removed temp ibm_fusion_handler.py

* Removed temp ibm_fusion_handler.py

* MNist works with Keras instead of PyTorch, removed IBM dependency

* Removed IBM federated learning, upgraded platform to Python 3.11/requirements, merged w/updated README

* Revert "Removed IBM federated learning & upgraded platform to Python 3.11"

* Revert "Revert "Removed IBM federated learning & upgraded platform to Python 3.11""

* Upgraded platform to run on Python 3.12 + requirements; updated README

* Merged Python 3.12 w/Miguel's FastAPI implementation

* Merged Python 3.12 w/Miguel's FastAPI implementation

* Fixing FastAPI naming error

* Adding ability of logger to env's; added decision tree PDF

* Update winniio.env to have debugger support

* Create mnist.env example

* Update .gitignore to allow mnist.env

* Update .gitignore to allow both .env to be unwritten

* Fixed pathing for logs and winniio bug fix

* Fixed errors for release 1; formatting

* final release 1 updates. refactoring directory names.  everything works on both mnist and winniio dataset

* adding env files back to main

* cleaning codebase

* Implemented index key into initialization + README; added missing torchvision requirement; fixed typos

* added `blockchain get index` ability; added node_type to policy definitions; tested w/`blockchain get [query]`

* Added db validation check when a node starts up; adjusted logger for node_server / node

* Converted Flask to FastAPI for continuous training code (originally written in Flask by Chahel); fixed merge conflicts

* Update README.md for continue-training command

* some docker stuff done, working on make commands

* Fixed file write paths to include/sorted by index

* Fixed round 1 bug for continuous training; added {index}-r to the blockchain to hold most recent aggregated model file

* created a new branch from pre-main where im working on creating the docker images and containerizing each node type

* created a new branch from pre-main where im working on creating the docker images and containerizing each node type

* updating README and small code cleaning to PR

* Adjust API calls to support dynamic nodes; minParams also dynamically adjusted for newly added nodes in middle of training

* Adjusted minParams to prevent stalling when it is greater than the number of active nodes

* Update README.md for dynamic nodes

* Update README.md for pathing

* working on letting app2.py work with platform_cmponents

* Enabled threading for node initialization (originally from Nikolas' code); fixed bugs and loggers

* Added new endpoint to update minParams (available after initialization); removed dynamic minParams when new nodes are added during training; updated README w/new endpoint

* Checks if index is unique; modularized initialization portion of aggregator and nodes

* app2.py correctly starts aggregator server

* updating indexing to write files to a directory named the index, not filename including the index name. also adding index parameter to the continue-traiing functionality. changes also require updates to env files, so please take note

* updating README with the continue-training rest call

* removing unused variable from env files

* Added index component to `/start-training`; updated README

* when adding a new node to the training process, it starts at the most recent round

* removing commented code

* Adding to-do's; bug fixes

* pc work from the previous meeting 5/1

* Added aggregator and node modularity (except for different data handlers); implemented simultaneous training processes on the same servers but only works if the same model is training on all processes; indicated which model is running during progression (minus the progress bar); ensured training is multithreaded and dynamic; index is now required for all current endpoints; fixed README

* Update README.md for module path

Will be changed later for simplicity of user

* testing docker build on pc

* 5/5/25 starting on make fucntionality, still having some  issues with envs and running the container with correct files

* Tested multiple (different) data handlers running concurrently; fixed README; passed db through initialization command; moved DB check in node_server from lifespan() to initialization --> marking the end of testing for training of multiple models

* Adjusted some logger messages

* Converted module_path to module_file to make it easier on user (appended file name to end of path found in .env)

* fixing directories on default .env files

* 5/6 work, docker build runs aggregator correctly, need to access from host browser

* 5-8 merging pre-main into containerization

* Update README.md (forgot commas on command examples)

* aggregator and nodes deployed as docker containers using a single docker compose file, hanging on training

* updating env files

* Revert "Converted module_path to module_file to make it easier on user (appended file name to end of path found in .env)"

This reverts commit 39a335a.

* Reworked "converting module_path to module_path";passed db_name; fixed README

* Update docker-compose.yaml

* updated readme to match pre-main

* updated readme to include containerization

* updated readme to include containerization

* Added message at the end of training process; checked that module_path existed; checked if node actually initialized; started aggregator direct inference; some code cleaning

* Added direct_inference to aggregator (takes in list of test data and list of labels/predictions); enabled aggregator to have an fl_model (in both data handlers); updated direct_inference in the mnist data handler (will update for winniio later)

* Updated agg. direct_inference for winniio; direct inference input now allows list of elements of any type (conversions and validations are done within the data_handler); updated README

* updated README w/WINNIIO direct_inference example

* added logs to node_server.py, updated docxkers to include the cuda path

* Commented plan for updating listen_for_update_agg()

* Updated aggregator to preemptively pull node model links in listen_for_update_agg() (new function for reading/fetching node model links)

* replaced absolute path in docker-compose with a relative path (no longer needs to be updated), removed cuda usage in the containerized apis

* removed unused app.py, renamed app2.py to app.py

* pulled node_server.py from pre-main

* Update README.md

* updated README to include running specific apis only

* updated README to include taking down all the apis

* fix for continue-training that kee[s end_round state

* updating gitignore to prevent default evnfiles in both EdgeLake and edgefl from being overwritten

* Added db script and .env files for chest xray bbox model

* Added chest xray bbox data handler (not yet tested) and updated .env's + requirements.txt

* Renamed bbox data handler for consistency

* Tested training of BBox data handler (need to reduce the agg/model file sizes because nodes take too long copying; also still need to test inference); updated README w/Kaggle setup

* Reduced model size of the chest xrays bbox data handler to fix the slow copying (invalid load key bug); inference tested and working

* updating env files and cleaning saved images (#96)

---------

Co-authored-by: DDublue <theonly1living@gmail.com>
Co-authored-by: royshadmon <16313057+royshadmon@users.noreply.github.com>
Co-authored-by: Miguel61823 <mmascare@ucsc.edu>
Co-authored-by: David Wu <122853894+DDublue@users.noreply.github.com>
Co-authored-by: Miguel61823 <146488686+Miguel61823@users.noreply.github.com>

* fixing winniio demo

* updating mnist demo and data handler

* updating README

* updating xray demo

---------

Co-authored-by: Evan Brannon-Wu <46462132+ejbrannonwu@users.noreply.github.com>
Co-authored-by: DDublue <theonly1living@gmail.com>
Co-authored-by: Miguel61823 <mmascare@ucsc.edu>
Co-authored-by: David Wu <122853894+DDublue@users.noreply.github.com>
Co-authored-by: Miguel61823 <146488686+Miguel61823@users.noreply.github.com>
royshadmon added a commit that referenced this pull request Jun 12, 2025
* Marked where+which IBM imports are used

* Marked all IBM DataHandler and ModelUpdate instances

* Removed need for IBM DataHandler for Winniio model

* Transitioned into own version of ModelUpdate, removing its IBM dependency (manually for now requires changing ModelUpdate to LocalModelUpdate in lib/python3.9/site-packages/ibmfl/aggregator/fusion/iter_avg_fusion_handler.py)

* [Forgot to push local_model_update.py] Transitioned into own version of ModelUpdate, removing its IBM dependency (manually for now requires changing ModelUpdate to LocalModelUpdate in lib/python3.9/site-packages/ibmfl/aggregator/fusion/iter_avg_fusion_handler.py)

* Updated IBM's Fusion Model to support our version of ModelUpdate (removes the need for manually changes)

* Created a base aggregation model class + federated average agg. model to replace IBM's IterAvgFusionHandler

* Removed temp ibm_fusion_handler.py

* Removed temp ibm_fusion_handler.py

* MNist works with Keras instead of PyTorch, removed IBM dependency

* Removed IBM federated learning, upgraded platform to Python 3.11/requirements, merged w/updated README

* Revert "Removed IBM federated learning & upgraded platform to Python 3.11"

* Revert "Revert "Removed IBM federated learning & upgraded platform to Python 3.11""

* Upgraded platform to run on Python 3.12 + requirements; updated README

* Merged Python 3.12 w/Miguel's FastAPI implementation

* Merged Python 3.12 w/Miguel's FastAPI implementation

* Fixing FastAPI naming error

* Adding ability of logger to env's; added decision tree PDF

* Update winniio.env to have debugger support

* Create mnist.env example

* Update .gitignore to allow mnist.env

* Update .gitignore to allow both .env to be unwritten

* Fixed pathing for logs and winniio bug fix

* Fixed errors for release 1; formatting

* final release 1 updates. refactoring directory names.  everything works on both mnist and winniio dataset

* adding env files back to main

* cleaning codebase

* Implemented index key into initialization + README; added missing torchvision requirement; fixed typos

* added `blockchain get index` ability; added node_type to policy definitions; tested w/`blockchain get [query]`

* Added db validation check when a node starts up; adjusted logger for node_server / node

* Converted Flask to FastAPI for continuous training code (originally written in Flask by Chahel); fixed merge conflicts

* Update README.md for continue-training command

* some docker stuff done, working on make commands

* Fixed file write paths to include/sorted by index

* Fixed round 1 bug for continuous training; added {index}-r to the blockchain to hold most recent aggregated model file

* created a new branch from pre-main where im working on creating the docker images and containerizing each node type

* created a new branch from pre-main where im working on creating the docker images and containerizing each node type

* updating README and small code cleaning to PR

* Adjust API calls to support dynamic nodes; minParams also dynamically adjusted for newly added nodes in middle of training

* Adjusted minParams to prevent stalling when it is greater than the number of active nodes

* Update README.md for dynamic nodes

* Update README.md for pathing

* working on letting app2.py work with platform_cmponents

* Enabled threading for node initialization (originally from Nikolas' code); fixed bugs and loggers

* Added new endpoint to update minParams (available after initialization); removed dynamic minParams when new nodes are added during training; updated README w/new endpoint

* Checks if index is unique; modularized initialization portion of aggregator and nodes

* app2.py correctly starts aggregator server

* updating indexing to write files to a directory named the index, not filename including the index name. also adding index parameter to the continue-traiing functionality. changes also require updates to env files, so please take note

* updating README with the continue-training rest call

* removing unused variable from env files

* Added index component to `/start-training`; updated README

* when adding a new node to the training process, it starts at the most recent round

* removing commented code

* Adding to-do's; bug fixes

* pc work from the previous meeting 5/1

* Added aggregator and node modularity (except for different data handlers); implemented simultaneous training processes on the same servers but only works if the same model is training on all processes; indicated which model is running during progression (minus the progress bar); ensured training is multithreaded and dynamic; index is now required for all current endpoints; fixed README

* Update README.md for module path

Will be changed later for simplicity of user

* testing docker build on pc

* 5/5/25 starting on make fucntionality, still having some  issues with envs and running the container with correct files

* Tested multiple (different) data handlers running concurrently; fixed README; passed db through initialization command; moved DB check in node_server from lifespan() to initialization --> marking the end of testing for training of multiple models

* Adjusted some logger messages

* Converted module_path to module_file to make it easier on user (appended file name to end of path found in .env)

* fixing directories on default .env files

* 5/6 work, docker build runs aggregator correctly, need to access from host browser

* 5-8 merging pre-main into containerization

* Update README.md (forgot commas on command examples)

* aggregator and nodes deployed as docker containers using a single docker compose file, hanging on training

* updating env files

* Revert "Converted module_path to module_file to make it easier on user (appended file name to end of path found in .env)"

This reverts commit 39a335a.

* Reworked "converting module_path to module_path";passed db_name; fixed README

* Update docker-compose.yaml

* updated readme to match pre-main

* updated readme to include containerization

* updated readme to include containerization

* Added message at the end of training process; checked that module_path existed; checked if node actually initialized; started aggregator direct inference; some code cleaning

* Added direct_inference to aggregator (takes in list of test data and list of labels/predictions); enabled aggregator to have an fl_model (in both data handlers); updated direct_inference in the mnist data handler (will update for winniio later)

* Updated agg. direct_inference for winniio; direct inference input now allows list of elements of any type (conversions and validations are done within the data_handler); updated README

* updated README w/WINNIIO direct_inference example

* added logs to node_server.py, updated docxkers to include the cuda path

* Commented plan for updating listen_for_update_agg()

* Updated aggregator to preemptively pull node model links in listen_for_update_agg() (new function for reading/fetching node model links)

* replaced absolute path in docker-compose with a relative path (no longer needs to be updated), removed cuda usage in the containerized apis

* removed unused app.py, renamed app2.py to app.py

* pulled node_server.py from pre-main

* Update README.md

* updated README to include running specific apis only

* updated README to include taking down all the apis

* fix for continue-training that kee[s end_round state

* updating gitignore to prevent default evnfiles in both EdgeLake and edgefl from being overwritten

* Added db script and .env files for chest xray bbox model

* Added chest xray bbox data handler (not yet tested) and updated .env's + requirements.txt

* Renamed bbox data handler for consistency

* Tested training of BBox data handler (need to reduce the agg/model file sizes because nodes take too long copying; also still need to test inference); updated README w/Kaggle setup

* Reduced model size of the chest xrays bbox data handler to fix the slow copying (invalid load key bug); inference tested and working

* updating env files and cleaning saved images (#96)

* Code update (#97)

* Pre-Main to Main push for XRay Data Handler (#95)

* Marked where+which IBM imports are used

* Marked all IBM DataHandler and ModelUpdate instances

* Removed need for IBM DataHandler for Winniio model

* Transitioned into own version of ModelUpdate, removing its IBM dependency (manually for now requires changing ModelUpdate to LocalModelUpdate in lib/python3.9/site-packages/ibmfl/aggregator/fusion/iter_avg_fusion_handler.py)

* [Forgot to push local_model_update.py] Transitioned into own version of ModelUpdate, removing its IBM dependency (manually for now requires changing ModelUpdate to LocalModelUpdate in lib/python3.9/site-packages/ibmfl/aggregator/fusion/iter_avg_fusion_handler.py)

* Updated IBM's Fusion Model to support our version of ModelUpdate (removes the need for manually changes)

* Created a base aggregation model class + federated average agg. model to replace IBM's IterAvgFusionHandler

* Removed temp ibm_fusion_handler.py

* Removed temp ibm_fusion_handler.py

* MNist works with Keras instead of PyTorch, removed IBM dependency

* Removed IBM federated learning, upgraded platform to Python 3.11/requirements, merged w/updated README

* Revert "Removed IBM federated learning & upgraded platform to Python 3.11"

* Revert "Revert "Removed IBM federated learning & upgraded platform to Python 3.11""

* Upgraded platform to run on Python 3.12 + requirements; updated README

* Merged Python 3.12 w/Miguel's FastAPI implementation

* Merged Python 3.12 w/Miguel's FastAPI implementation

* Fixing FastAPI naming error

* Adding ability of logger to env's; added decision tree PDF

* Update winniio.env to have debugger support

* Create mnist.env example

* Update .gitignore to allow mnist.env

* Update .gitignore to allow both .env to be unwritten

* Fixed pathing for logs and winniio bug fix

* Fixed errors for release 1; formatting

* final release 1 updates. refactoring directory names.  everything works on both mnist and winniio dataset

* adding env files back to main

* cleaning codebase

* Implemented index key into initialization + README; added missing torchvision requirement; fixed typos

* added `blockchain get index` ability; added node_type to policy definitions; tested w/`blockchain get [query]`

* Added db validation check when a node starts up; adjusted logger for node_server / node

* Converted Flask to FastAPI for continuous training code (originally written in Flask by Chahel); fixed merge conflicts

* Update README.md for continue-training command

* some docker stuff done, working on make commands

* Fixed file write paths to include/sorted by index

* Fixed round 1 bug for continuous training; added {index}-r to the blockchain to hold most recent aggregated model file

* created a new branch from pre-main where im working on creating the docker images and containerizing each node type

* created a new branch from pre-main where im working on creating the docker images and containerizing each node type

* updating README and small code cleaning to PR

* Adjust API calls to support dynamic nodes; minParams also dynamically adjusted for newly added nodes in middle of training

* Adjusted minParams to prevent stalling when it is greater than the number of active nodes

* Update README.md for dynamic nodes

* Update README.md for pathing

* working on letting app2.py work with platform_cmponents

* Enabled threading for node initialization (originally from Nikolas' code); fixed bugs and loggers

* Added new endpoint to update minParams (available after initialization); removed dynamic minParams when new nodes are added during training; updated README w/new endpoint

* Checks if index is unique; modularized initialization portion of aggregator and nodes

* app2.py correctly starts aggregator server

* updating indexing to write files to a directory named the index, not filename including the index name. also adding index parameter to the continue-traiing functionality. changes also require updates to env files, so please take note

* updating README with the continue-training rest call

* removing unused variable from env files

* Added index component to `/start-training`; updated README

* when adding a new node to the training process, it starts at the most recent round

* removing commented code

* Adding to-do's; bug fixes

* pc work from the previous meeting 5/1

* Added aggregator and node modularity (except for different data handlers); implemented simultaneous training processes on the same servers but only works if the same model is training on all processes; indicated which model is running during progression (minus the progress bar); ensured training is multithreaded and dynamic; index is now required for all current endpoints; fixed README

* Update README.md for module path

Will be changed later for simplicity of user

* testing docker build on pc

* 5/5/25 starting on make fucntionality, still having some  issues with envs and running the container with correct files

* Tested multiple (different) data handlers running concurrently; fixed README; passed db through initialization command; moved DB check in node_server from lifespan() to initialization --> marking the end of testing for training of multiple models

* Adjusted some logger messages

* Converted module_path to module_file to make it easier on user (appended file name to end of path found in .env)

* fixing directories on default .env files

* 5/6 work, docker build runs aggregator correctly, need to access from host browser

* 5-8 merging pre-main into containerization

* Update README.md (forgot commas on command examples)

* aggregator and nodes deployed as docker containers using a single docker compose file, hanging on training

* updating env files

* Revert "Converted module_path to module_file to make it easier on user (appended file name to end of path found in .env)"

This reverts commit 39a335a.

* Reworked "converting module_path to module_path";passed db_name; fixed README

* Update docker-compose.yaml

* updated readme to match pre-main

* updated readme to include containerization

* updated readme to include containerization

* Added message at the end of training process; checked that module_path existed; checked if node actually initialized; started aggregator direct inference; some code cleaning

* Added direct_inference to aggregator (takes in list of test data and list of labels/predictions); enabled aggregator to have an fl_model (in both data handlers); updated direct_inference in the mnist data handler (will update for winniio later)

* Updated agg. direct_inference for winniio; direct inference input now allows list of elements of any type (conversions and validations are done within the data_handler); updated README

* updated README w/WINNIIO direct_inference example

* added logs to node_server.py, updated docxkers to include the cuda path

* Commented plan for updating listen_for_update_agg()

* Updated aggregator to preemptively pull node model links in listen_for_update_agg() (new function for reading/fetching node model links)

* replaced absolute path in docker-compose with a relative path (no longer needs to be updated), removed cuda usage in the containerized apis

* removed unused app.py, renamed app2.py to app.py

* pulled node_server.py from pre-main

* Update README.md

* updated README to include running specific apis only

* updated README to include taking down all the apis

* fix for continue-training that kee[s end_round state

* updating gitignore to prevent default evnfiles in both EdgeLake and edgefl from being overwritten

* Added db script and .env files for chest xray bbox model

* Added chest xray bbox data handler (not yet tested) and updated .env's + requirements.txt

* Renamed bbox data handler for consistency

* Tested training of BBox data handler (need to reduce the agg/model file sizes because nodes take too long copying; also still need to test inference); updated README w/Kaggle setup

* Reduced model size of the chest xrays bbox data handler to fix the slow copying (invalid load key bug); inference tested and working

* updating env files and cleaning saved images (#96)

---------

Co-authored-by: DDublue <theonly1living@gmail.com>
Co-authored-by: royshadmon <16313057+royshadmon@users.noreply.github.com>
Co-authored-by: Miguel61823 <mmascare@ucsc.edu>
Co-authored-by: David Wu <122853894+DDublue@users.noreply.github.com>
Co-authored-by: Miguel61823 <146488686+Miguel61823@users.noreply.github.com>

* fixing winniio demo

* updating mnist demo and data handler

* updating README

* updating xray demo

---------

Co-authored-by: Evan Brannon-Wu <46462132+ejbrannonwu@users.noreply.github.com>
Co-authored-by: DDublue <theonly1living@gmail.com>
Co-authored-by: Miguel61823 <mmascare@ucsc.edu>
Co-authored-by: David Wu <122853894+DDublue@users.noreply.github.com>
Co-authored-by: Miguel61823 <146488686+Miguel61823@users.noreply.github.com>

---------

Co-authored-by: DDublue <theonly1living@gmail.com>
Co-authored-by: Evan Brannon-Wu <ejbrannonwu@gmail.com>
Co-authored-by: Evan Brannon-Wu <46462132+ejbrannonwu@users.noreply.github.com>
Co-authored-by: Miguel61823 <mmascare@ucsc.edu>
Co-authored-by: David Wu <122853894+DDublue@users.noreply.github.com>
Co-authored-by: Miguel61823 <146488686+Miguel61823@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant