https://predicting-persona-b09group04.medium.com/
git clone https://github.com/jonxsong/DSC180AB-Capstone.git
cd DSC180AB-Capstone
python run.py test
./config/data-params.json - directory where data should be output to
./config/hw-metric-histo-data-params.json - description of the dataset and features we utilize
./config/systems-sysinfo-unique-normalized-data-params.json - description of the dataset and features we utilize
./config/ucsd-apps-execlass-data-params.json - description of the dataset and features we utilize
./config/frgnd_backgrnd_apps-data-params.json - description of the dataset and features we utilize
./notebooks/eda.ipynb - notebook containing data explorations from DSC180B
./notebooks/dsc180a-notebook.ipynb - notebook containing data explorations from DSC180A
./src/data_exploration.py - file containing relevant methods for data exploration
./src/model.py - file containing relevant methods for data modelling
./requirements.txt - required packages
./run.py - call run.py to run data analysis
./data/out/... - this location should hold all the outputted pictures generated from methods
./data/raw/... - this location should hold all the datasets downloaded below
https://drive.google.com/drive/folders/1nNpwhzrbKUJd0ZwbCYLGQH49CKkKLTQ4?usp=sharing
The datasets we are using are too large for github. The datasets should be stored in /data/raw/.
Link to Project Report: https://docs.google.com/document/d/1IpWfuG2IxurT5LOMyudWpn3UOLsKYKdjbbwqNhPGlYk/edit?usp=sharing
Jon: - Report + main ideas - data analysis - code breakdown - repository structuring - notebook outlining - script writing
Vince: - data modeling - Report + targets - data cleaning - data explorations - classifications - Visual Presentation Checkpoint - Website - Final Report - Slides
Keshan: - data preparation - tabled data - key notes all throughout notebook - graphs + graph analysis - ATL work