Skip to content

greenbean1/pdga-analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

47 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PDGA Analysis

This is a data analysis project exploring PDGA player information

Motivation

  1. I was curious to learn about the distribution of skilled disc golfers across US states
  2. I had never pulled data from an API and this appeared to be a great real-world example
  3. I have played around with Pandas, but primarily in academic exercises, so I wanted to use it in the wild

How Does This Work?

  1. At a high level, this code pulls player data from the PDGA API and spits out a CSV
  2. There are distinct modules around PDGA API calls, dealing with CSVs and Pandas.
  3. When running main(), there are different options to choose from based on what is desired (ADD MORE HERE)

Misc

Project Milestones

1. Pull down JSON of Eagle McMahon via Requests Library

2. Save sessid & session_name into txt file so do not need to hit API for login every time

3. Save Eagle McMahon's JSON into a CSV

4. Pull down JSON and save into CSV along the way of all MPO, US players

5. Clean player API player info via CSA library

6. Get state population CSV (Census Bureau - including 54 "states)

7. Create dataframe merging CSVs with population data, number of players rated 1000+

8. Create heat map data viz (Notebook? Seaborn/Plotly/Dash/Flourish)

9. Refactor code (ex: truly use constants module)

Things I Learned (or at least started learning)

  1. HTTP Status Codes
  2. Requests Library
  3. Working with JSON
  4. Data Visualization Options: Tableau, Plotly, Flourish, Chartify, etc
  5. New Lines - These primarily indicate the end of a line
  6. Accessing APIs - Put in safety measures & sanity check results before scaling up (I pulled the same 200 records multiple times)
  7. Census Data is Helpful!
  8. When reading data from CSV to Dataframe, use engine='python'
  9. Iterating Through Dataframes
  10. Adding Columns to Dataframes Based on Criteria
  11. Aggregations in Pandas
  12. Merging Data in Pandas
  13. Use a constants module heavily for consistency - you'll enjoy more robust code quicker & easier
  14. Exception Handling
  15. Reading numeric data into Pandas - watch out for commas making strings/Objects
  16. Adding arguments to scripts - I used the argparse library

Potential Next Steps

  1. For visualization: Dash, Plotly, Flourish, Chartify instead of Tableau
  2. Analyze European disc golf growth (Pick top few countries -> tournament & course growth)
  3. Fix bug I noticed (but doesn't currently matter) in pulling player stats data on 'prize' & 'last_modified' columns ex: PDGA number 148486
  4. Move argument validation & parsing outside of main.py into its own module

Credits

Thank you Nathan Hoover and Jan Van Bruggen for your huge help on this project!

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages