3. Project Architecture

We are developing using Python and Jupyter Notebooks, and our production code is in the "/main" folder.

/data : will include any project-specific files used as lists or dictionaries by the code files. Sub-folders will organize external and manually produced datasets such as any data files we want to keep prior to implementing a DBMS.
/code : will include jupyter notebooks that perform tasks such as
- pulling data from Twitter
- cleaning the data for pre-processing
- pre-processing text data, such as removing stop words, stemming/lemmatization, NER
- ML algorithms for classifying untrained data by topic and subject
- storing scores, hyperparameter settings, updated classifications
- outputting results to output folder
/pipeline : as code is run, respective output files will be created here. Please refer to this article for an explanation of the organization.
/output : will contain final file output from code to be used as
- user research data,
- reports/presentations,
- feeding to applications for in-app content (though this might require a DBMS be implemented) Use "/sandbox" folder for storing experiments and playing around. Right now, everything in the "/twitter" folder should go in "/sandbox." "/outreach" is for organizing materials for producing presentations.

When the platform is complete, it will be used by cloning the repo, adding project-specific files, and running the code in the "main/code" folder. A team may use one or more notebooks in the "/main/code" folder to accomplish an end-to-end analysis project.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

3. Project Architecture

Clone this wiki locally