Detecting Misstated Financial Statements With Deep Learning and Interactive Visualization

Katrina Ni, Leiling Tao

Data Science Pipeline

How to run

Set up environment

Clone the repository

 git clone https://github.com/nilichen/cmpt733-project.git
 cd cmpt733-project

Initialize the folder with virtualenv

 virtualenv venv
 source venv/bin/activate

Install the packages
```
 pip install -r requirements.txt
```

Download and Preprocess the data

Download data from https://drive.google.com/open?id=1Tt2y8qn8V5oTshr9nDNtbY_hmnz1XKet
Unzip the data and start preprocessing => will produce annual_compustat_ratios.zip if it does not exist in the data folder yet

By default it is included in the downloaded data.zip
```
 mkdir data
 unzip data.zip -d data/
 python preprocess.py
```

Train the model

See models_with_raw_data.py, models_with_ratios.py, models_ensemble.py for reference

prediction result from the mete-model is already included in downloaded data.zip as results.csv

Deploy Dash application offline

Merge preprocessed data with the results from the meta-model

 results = pd.read_csv('data/results.csv')
 df_ratios_only = pd.read_csv("data/annual_compustat_ratios.zip")
 df_ratios_only = df_ratios_only.merge(
 results[['fyear', 'gvkey', 'pred_prob']], on=['gvkey', 'fyear'])
 df_ratios_only.to_csv('data/annual_compustat_ratios.zip', index=False)

Run in local server - see https://dash.plot.ly/deployment if want to deploy the app in Heroku

can skip preprocessing and training part and run this command directly with downloaded data
```
python app.py
```

The Data Product

https://financial-dashboard-app.herokuapp.com/

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
README.md		README.md
app.py		app.py
financial_explore.ipynb		financial_explore.ipynb
model_ensemble.py		model_ensemble.py
models_with_ratios.py		models_with_ratios.py
models_with_raw_data.py		models_with_raw_data.py
preprocess.py		preprocess.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Detecting Misstated Financial Statements With Deep Learning and Interactive Visualization

Data Science Pipeline

How to run

Set up environment

Download and Preprocess the data

Train the model

Deploy Dash application offline

The Data Product

About

Releases

Packages

Contributors 2

Languages

nilichen/cmpt733-project

Folders and files

Latest commit

History

Repository files navigation

Detecting Misstated Financial Statements With Deep Learning and Interactive Visualization

Data Science Pipeline

How to run

Set up environment

Download and Preprocess the data

Train the model

Deploy Dash application offline

The Data Product

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages