# Capstone Two - Data Wrangling

In this notebook, I describe and outline the methods used to acquire two main sources of data that will be used for this project. One dataset consists of information of all MLS players as listed on the [official website](https://www.mlssoccer.com/players?sort=name&order=ASC) including the geographic hometown data of each player. 

The second dataset holds information of US youth national champions, for both boys and girls dating back as early as the mid 1930's. Included in these records are the city and state of each club; the goal is to examine any relationships between the MLS players and their hometowns, where its assumed the professionals played in their youth, and the clubs winning national titles. 

In [5]:
import data
import get_player_data
import us_youth_national_champions
import pandas as pd
import numpy as np

## MLS Player Data

The MLS player data was collected via webscraping and parsing of json in order to extract the desired information. 

In [9]:
mls_players = pd.read_csv('mls_players_cleaned_filled.csv', encoding= 'unicode_escape')

In [10]:
mls_players

Unnamed: 0,Player Name,MLS Club,Jersey Number,Position,Hometown: City,Hometown: State/Province,Hometown: Country,Age,Height (in.),Weight (lbs.)
0,A.J. DeLaGarza,Inter Miami CF,20.0,Defender,Bryans Road,Maryland,United States of America,33,"5' 9""",150.0
1,Aaron Herrera,Real Salt Lake,22.0,Defender,Las Cruces,New Mexico,United States of America,23,"5' 11""",160.0
2,Aaron Long,New York Red Bulls,33.0,Defender,Oak Hills,California,United States of America,28,"6' 1""",175.0
3,Aaron Schoenfeld,Minnesota United FC,12.0,Forward,Knoxville,Tennessee,United States of America,30,"6' 4""",190.0
4,Abdul Rwatubyaye,Colorado Rapids,14.0,Defender,Kigali,,Rwanda,24,"6' 1""",175.0
...,...,...,...,...,...,...,...,...,...,...
757,Zac MacMath,Real Salt Lake,18.0,Goalkeeper,St. Petersburg,Florida,United States of America,29,"6' 2""",190.0
758,Zac McGraw,Portland Timbers,85.0,Defender,Torrance,California,United States of America,23,"6' 4""",205.0
759,Zachary Brault-Guillard,Montreal Impact,15.0,Defender,Lyon,,France,21,"5' 7""",145.0
760,Zarek Valentin FC,Houston Dynamo,4.0,Defender,Lancaster,Pennsylvania,United States of America,29,"5' 11""",156.0


In [23]:
mls_players.describe()

Unnamed: 0,Jersey Number,Age,Weight (lbs.)
count,725.0,762.0,715.0
mean,21.583448,25.439633,166.629371
std,18.839105,5.037797,18.487818
min,1.0,0.0,104.0
25%,9.0,22.0,154.0
50%,18.0,25.0,165.0
75%,27.0,29.0,180.0
max,99.0,38.0,236.0


From what we see in the mls_players df, there exists some players that have an Age value of zero. 

In [29]:
mls_players.loc[mls_players['Age'] == 0.0]

Unnamed: 0,Player Name,MLS Club,Jersey Number,Position,Hometown: City,Hometown: State/Province,Hometown: Country,Age,Height (in.),Weight (lbs.)
12,Adam Saldana,LA Galaxy,,Forward,Panorama City,California,United States of America,0,,
351,Jalen Neal,LA Galaxy,,Forward,Lakewood,California,United States of America,0,,
518,Marcus Ferkranus,LA Galaxy,,Forward,Santa Clarita,California,United States of America,0,,


In [33]:
mls_players.groupby('MLS Club').mean()

Unnamed: 0_level_0,Jersey Number,Age,Weight (lbs.)
MLS Club,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Atlanta United FC,17.166667,25.266667,168.111111
Chicago Fire FC,19.666667,23.827586,168.428571
Colorado Rapids,20.586207,25.2,160.862069
Columbus Crew SC,15.896552,27.033333,168.466667
D.C. United,22.291667,26.538462,167.038462
FC Cincinnati,22.944444,25.8,171.9
FC Dallas,18.366667,24.1875,165.78125
Houston Dynamo,17.571429,26.172414,166.444444
Inter Miami CF,16.62069,26.0,166.793103
LA Galaxy,24.214286,23.03125,165.206897


## US Youth National Champions Club Data

In [20]:
youth_national_champions_clubs = pd.read_csv('us_youth_national_champs_cleaned.csv')

In [21]:
youth_national_champions_clubs.head()

Unnamed: 0,Age Group,Championship Year,Club Name,Club Region,Club State
0,BOYS UNDER 19,1935,Reliable Juniors of New Bedford,MA,Massachusetts
1,BOYS UNDER 19,1936,Hatikvoh Juniors of Brooklyn,NY,New York
2,BOYS UNDER 19,1937,Hatikvoh Juniors of Brooklyn,NY,New York
3,BOYS UNDER 19,1938,Lighthouse Boys' Club of Philadelphia,PA,Pennsylvania
4,BOYS UNDER 19,1939,Avella Juniors of Avella,PA,Pennsylvania
