This is a data analysis project exploring PDGA player information
- I was curious to learn about the distribution of skilled disc golfers across US states
- I had never pulled data from an API and this appeared to be a great real-world example
- I have played around with Pandas, but primarily in academic exercises, so I wanted to use it in the wild
- At a high level, this code pulls player data from the PDGA API and spits out a CSV
- There are distinct modules around PDGA API calls, dealing with CSVs and Pandas.
- When running main(), there are different options to choose from based on what is desired (ADD MORE HERE)
1. Pull down JSON of Eagle McMahon via Requests Library
2. Save sessid & session_name into txt file so do not need to hit API for login every time
3. Save Eagle McMahon's JSON into a CSV
4. Pull down JSON and save into CSV along the way of all MPO, US players
5. Clean player API player info via CSA library
6. Get state population CSV (Census Bureau - including 54 "states)
7. Create dataframe merging CSVs with population data, number of players rated 1000+
8. Create heat map data viz (Notebook? Seaborn/Plotly/Dash/Flourish)
9. Refactor code (ex: truly use constants module)
- HTTP Status Codes
- Requests Library
- Working with JSON
- Data Visualization Options: Tableau, Plotly, Flourish, Chartify, etc
- New Lines - These primarily indicate the end of a line
- Accessing APIs - Put in safety measures & sanity check results before scaling up (I pulled the same 200 records multiple times)
- Census Data is Helpful!
- When reading data from CSV to Dataframe, use engine='python'
- Iterating Through Dataframes
- Adding Columns to Dataframes Based on Criteria
- Aggregations in Pandas
- Merging Data in Pandas
- Use a constants module heavily for consistency - you'll enjoy more robust code quicker & easier
- Exception Handling
- Reading numeric data into Pandas - watch out for commas making strings/Objects
- Adding arguments to scripts - I used the argparse library
- For visualization: Dash, Plotly, Flourish, Chartify instead of Tableau
- Analyze European disc golf growth (Pick top few countries -> tournament & course growth)
- Fix bug I noticed (but doesn't currently matter) in pulling player stats data on 'prize' & 'last_modified' columns ex: PDGA number 148486
- Move argument validation & parsing outside of main.py into its own module
Thank you Nathan Hoover and Jan Van Bruggen for your huge help on this project!