# FIFA 22 Player Data

This notebook demonstrates some of the data processing techniques you'll use in the homework on the make player data from FIFA 22. This data was originally obtained from [Kaggle](https://www.kaggle.com/stefanoleone992/fifa-22-complete-player-dataset)

In [1]:
import numpy as np
import matplotlib.pyplot as plt
import math
plt.rcParams['figure.dpi'] = 300
import csv
from datetime import datetime
%matplotlib inline

These functions read in the data from the file and perform type conversion for the dataset we have been given. 

In [2]:
def read_records(filename):
    """
    Read a CSV file into a list of dictionaries, where each 
    dictionary has keys taken from the column names in the file.
    """
    records = []
    with open(filename, newline='') as csvfile:
        reader = csv.DictReader(csvfile)
        for row in reader:
            convert_row(row)            
            records.append(row)

    return records


def convert_row(row):
    """
    Take a row and convert the values to the known data types.
    """
#     duration is in seconds
    row['age'] = float(row['age'])
    row['weight_kg'] = float(row['weight_kg'])
    row['height_cm'] = float(row['height_cm'])


And the following cell reads in the data

In [3]:
data = read_records('../data/players_22.csv')
print('Read in {} records'.format(len(data)))



Read in 19239 records


This next line extracts all the height values for all the players

In [4]:
heights = [row['height_cm'] for row in data]



This function groups together the data for all values of the given field. For example, `height_groups = group_data_by_field(data, "height_cm")` would create a dictionary where the keys are hight values and the values are lists of rows from the dataset. You could then use `height_groups[170.0]` to print out the data from all players who are 170cm tall.

In [5]:
def group_data_by_field(datalist, field):
    """
    Given a list of dictionaries, return a dictionary of lists where each sublist contains all dictionaries with the same value of that field, 
    and the key of that entry is that value.
    """
    output = dict()
    for d in datalist:
        # get the value of the field for this entry
        field_value = d[field]
        # if this value has not been seen before, create a new output entry for it
        if field_value not in output:
            output[field_value] = []
        # append the entry to the list for this field value
        output[field_value].append(d)
    return output
        

In [6]:
height_groups = group_data_by_field(data, "height_cm")

In [7]:
height_groups[170.0]

[{'sofifa_id': '158023',
  'player_url': 'https://sofifa.com/player/158023/lionel-messi/220002',
  'short_name': 'L. Messi',
  'long_name': 'Lionel Andrés Messi Cuccittini',
  'player_positions': 'RW, ST, CF',
  'overall': '93',
  'potential': '93',
  'value_eur': '78000000.0',
  'wage_eur': '320000.0',
  'age': 34.0,
  'dob': '1987-06-24',
  'height_cm': 170.0,
  'weight_kg': 72.0,
  'club_team_id': '73.0',
  'club_name': 'Paris Saint-Germain',
  'league_name': 'French Ligue 1',
  'league_level': '1',
  'club_position': 'RW',
  'club_jersey_number': '30',
  'club_loaned_from': '',
  'club_joined': '2021-08-10',
  'club_contract_valid_until': '2023',
  'nationality_id': '52',
  'nationality_name': 'Argentina',
  'nation_team_id': '1369.0',
  'nation_position': 'RW',
  'nation_jersey_number': '10',
  'preferred_foot': 'Left',
  'weak_foot': '4',
  'skill_moves': '4',
  'international_reputation': '5',
  'work_rate': 'Medium/Low',
  'body_type': 'Unique',
  'real_face': 'Yes',
  'release