Skip to content

gu12934/LHL_Final_Capstone_Project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

94 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Credit Spending Habits Visualization πŸš€ πŸ₯‡
Credit Spending in India using Tableau and Streamlit πŸ’³

image

Project/Goals 🎈


In this project, we will combine and practice implementing what we have learned throughout this course, including:

  • SQL queries using pgadmin

  • Cleaning data using python

  • Creating an interactive dashboard with tableau

  • Apply your bootcamp learnings into a single end-to-end project: Data retrieval, EDA / cleaning, Statistical modeling (optional), Tableau

  • Main deliverable: Tableau dashboard(s) and presentation, Jupyter notebooks if used


Project Structure

β”œβ”€β”€ data                      <- Data Folder 
β”‚   β”œβ”€β”€ cleaned_data_india_credit.csv <- Cleaned data source
β”‚   β”œβ”€β”€ credit_card_india.csv <- Oringinal data source
β”‚   
β”œβ”€β”€ output                    <- Images Used in the Project and output files
β”‚   β”œβ”€β”€ Credit Dash.png       <- dashboard snapshot
β”‚   β”œβ”€β”€ sql_output.md         <- output of sql queries
β”‚   β”œβ”€β”€ ERD_diagram.png       <- Entity relationship diagram
β”‚ 
β”œβ”€β”€ src/modules               <- Source Code 
β”‚   β”œβ”€β”€ cleaning_data.ipynb   <- code for data cleaning 
β”‚   β”œβ”€β”€ SQL_Credit_card.ipynb <- code for sql queries
β”‚   └── EDA.ipynb             <- EDA file
β”‚   └── credit-card-spending-ml.ipynb <-machine learning files   
β”‚   └── Credit_viz.twbx       <- tableau file with dashboard
β”‚       β”œβ”€β”€ streamlit         <- Folder For All The Streamlit App Code  
β”‚         β”œβ”€β”€ app.py          <- viz app for different plots
β”‚         β”œβ”€β”€ eda_app.py      <- pandas profiling
β”‚         β”œβ”€β”€ requirements.txt<- list of all dependencies 
β”‚
β”œβ”€β”€ __init__.py               <- Package Initializer          
└── README.md                 <- Project Documentation

Files Used πŸ“



Process ⏩

  • Step 1: Aquire dataset and import into jupyter notebook, clean dataset and export file

  • Step 2: Use cleaned file and import into tableau and SQL

  • Step 3: Run sql queries to answer questions like top 5: Cities with fraud, which gender has most fraud, what credit card had most fraud

  • Step 4: Make an interactive dashboard based on city and date on tableau


Visualization πŸ“Š

image image image image image


SQL Queries πŸ“‰


Word Cloud ☁️

image


Streamlit πŸ§‘β€πŸš€

  • This gif showcases the pandas profiling module deployed on the streamlit that allows you to do EDA by uploading a dataset

Alt Text

  • This allows you to do EDA with any dataset uploaded, it will create various plots for you to conduct analysis, also deployed on streamlit

Alt Text

  • Test out the app here: EDA app deployment on streamlit cloud
  • Note: the app does not use the uploaded dataset for some reason, it is supposed to do what is in the gif above, but you can use it with local host

The Streamlit App

To run the Streamlit App, run the following command:

streamlit run app.py
  • Note for the above, you need to be in the correct folder

Presentation 🌠

https://docs.google.com/presentation/d/1zzXzLE6kJSKPbSglUbs9xWFQATFGJms0aH-3kAz8Uek/edit#slide=id.p

https://public.tableau.com/app/profile/gurmol.sohi/vizzeshttps://public.tableau.com/app/profile/gurmol.sohi/viz/Credit_viz/Dashboard1

  • If you click on the city, it will adjust all the other graphs, you can also select specific months to gain insight

Results πŸ”

  • Mumbai and Bengaluru had greatest spending,
  • Most spending was on bills, food, fuel
  • Highest percentage of gold type was in Zira
  • Lowest gold card type was in Achalpur
  • The highest spend month was in August with a Platinum card
  • Greater Mumbai had the highest expense type with Bills and lowest with entertainment
  • Greater Mumbai had 14% of the total spending of the whole dataset
  • 01/2015 had the highest amount of spending

image

image

image

image

image

Challenges 🎱


  • It was a challenge to create interactivty on tableau
  • had lots of issues creating a db with sql-lite to begin with and running queries
  • importing the file from a different folder into cleaning ipynb caused some issues
  • streamlit was not working at first, issues with pandas profiling library
  • adding a state column in excel online to make the tableau dashboard better

Future Goals πŸ₯…

  • develop a ML model and make predictions on credit fraud
  • compare to other datasets and other spending habits in different countries
  • customer segmentation, credit risk, credit fraud detection (anomalies), credit approval projects coming soon

About

Final capstone project using sql, pgadmin, api, python

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published