-
Notifications
You must be signed in to change notification settings - Fork 12
3. Project Architecture
frhino edited this page Jan 13, 2019
·
5 revisions
We are developing using Python and Jupyter Notebooks, and our production code is in the "/main" folder.
- /data : will include any project-specific files used as lists or dictionaries by the code files. Sub-folders will organize external and manually produced datasets such as any data files we want to keep prior to implementing a DBMS.
- /code : will include jupyter notebooks that perform tasks such as
- pulling data from Twitter
- cleaning the data for pre-processing
- pre-processing text data, such as removing stop words, stemming/lemmatization, NER
- ML algorithms for classifying untrained data by topic and subject
- storing scores, hyperparameter settings, updated classifications
- outputting results to output folder
- /pipeline : as code is run, respective output files will be created here. Please refer to this article for an explanation of the organization.
- /output : will contain final file output from code to be used as
- user research data,
- reports/presentations,
- feeding to applications for in-app content (though this might require a DBMS be implemented) Use "/sandbox" folder for storing experiments and playing around. Right now, everything in the "/twitter" folder should go in "/sandbox." "/outreach" is for organizing materials for producing presentations.
When the platform is complete, it will be used by cloning the repo, adding project-specific files, and running the code in the "main/code" folder. A team may use one or more notebooks in the "/main/code" folder to accomplish an end-to-end analysis project.