Skip to content

nilichen/cmpt733-project

Repository files navigation

Detecting Misstated Financial Statements With Deep Learning and Interactive Visualization

Katrina Ni, Leiling Tao

Data Science Pipeline

pipeline

How to run

Set up environment

  • Clone the repository
     git clone https://github.com/nilichen/cmpt733-project.git
     cd cmpt733-project
  • Initialize the folder with virtualenv
     virtualenv venv
     source venv/bin/activate
  • Install the packages
     pip install -r requirements.txt

Download and Preprocess the data

Train the model

  • See models_with_raw_data.py, models_with_ratios.py, models_ensemble.py for reference

    prediction result from the mete-model is already included in downloaded data.zip as results.csv

Deploy Dash application offline

  • Merge preprocessed data with the results from the meta-model

     results = pd.read_csv('data/results.csv')
     df_ratios_only = pd.read_csv("data/annual_compustat_ratios.zip")
     df_ratios_only = df_ratios_only.merge(
     results[['fyear', 'gvkey', 'pred_prob']], on=['gvkey', 'fyear'])
     df_ratios_only.to_csv('data/annual_compustat_ratios.zip', index=False)
  • Run in local server - see https://dash.plot.ly/deployment if want to deploy the app in Heroku

    can skip preprocessing and training part and run this command directly with downloaded data

    python app.py

The Data Product

https://financial-dashboard-app.herokuapp.com/ Dashboard

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages