# Predict - Python Data Structures

This is the project on Python Data Structures. We will start by transforming the raw data for you. You then need to create 9 functions as specified in the project instructions.

NB: Make sure the football_players.csv file is in the same directory as this notebook

**PROJECT RULES**:

* You may not import any external packages (except for pandas) - all of the functions need to be solved WITHOUT THE USE OF ANY OTHER EXTERNAL MODULES.
* Most importantly: your functions need to return the answer (not just print it out).
* Do not add or remove any cells from this notebook. Use another notebook to experiment in (or in which to do your workings), but your submission may not have any additional cells or functions.
* Only fill in code where the #YOUR CODE tags appear. No code outside these areas (or outside the given functions) will be marked.

## Transform Data

### Import Data

In [1]:
import pandas as pd

# Load data - pass 'Name' as our index column
load_df = pd.read_csv('football_players-a-26.csv', index_col='Name', low_memory=False).sample(frac=1)

# Create dataframe called df
df = pd.DataFrame(load_df)

In [2]:
df.head()

Unnamed: 0_level_0,Age,Nationality,Overall,Acceleration,Aggression,Agility,Balance,Ball control,Composure,Crossing,...,Short passing,Shot power,Sliding tackle,Sprint speed,Stamina,Standing tackle,Strength,Vision,Volleys,Preferred Positions
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
P. Gerkens,22,Belgium,71,52,58,67,72,68,66,70,...,71,70,63,57,77,66,57,70,64,CDM CM
J. Saunders,36,Puerto Rico,66,53,41,34,46,15,53,13,...,33,23,17,51,39,10,65,10,14,GK
J. Pupe,20,Belgium,58,63,43,46,50,49,49,23,...,48,41,51,55,51,64,65,33,24,CB
M. Camporese,25,Italy,69,65,67,66,61,53,67,14,...,53,49,71,67,65,73,67,23,11,CB
J. Allen,22,United States,61,64,70,77,70,64,57,49,...,62,54,59,75,61,59,60,58,39,CM CAM LW


### Create Position Type Column

In [3]:
def position_type(s):
    
    """This function converts the individual positions (abbreviations) and classfies it
    as either a forward, midfielder, back or goal keeper"""
    
    if (s[-2] == 'T') | (s[-2] == 'W'):
        return 'Forward'
    elif s[-2] == 'M':
        return 'Midfielder'
    elif s[-2] == 'B':
        return 'Back'
    else:
        return 'GoalKeeper'

# Create position type column
df['Preferred Positions Type'] = df['Preferred Positions'].apply(position_type)

### Transform Attribute Columns to Floats

In [4]:
# Select all attribute columns
cols = ['Overall', 'Acceleration', 'Aggression',
       'Agility', 'Balance', 'Ball control', 'Composure', 'Crossing', 'Curve',
       'Dribbling', 'Finishing', 'Free kick accuracy', 'GK diving',
       'GK handling', 'GK kicking', 'GK positioning', 'GK reflexes',
       'Heading accuracy', 'Interceptions', 'Jumping', 'Long passing',
       'Long shots', 'Marking', 'Penalties', 'Positioning', 'Reactions',
       'Short passing', 'Shot power', 'Sliding tackle', 'Sprint speed',
       'Stamina', 'Standing tackle', 'Strength', 'Vision', 'Volleys']

def to_float(x):    
    "Transforms attribute columns to type float"
    
    if type(x) is int:
        return float(x)
    else:
        return float(x[0:2])

df[cols] = df[cols].applymap(to_float)

In [5]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 17981 entries, P. Gerkens to Silvosinho
Data columns (total 39 columns):
Age                         17981 non-null int64
Nationality                 17981 non-null object
Overall                     17981 non-null float64
Acceleration                17981 non-null float64
Aggression                  17981 non-null float64
Agility                     17981 non-null float64
Balance                     17981 non-null float64
Ball control                17981 non-null float64
Composure                   17981 non-null float64
Crossing                    17981 non-null float64
Curve                       17981 non-null float64
Dribbling                   17981 non-null float64
Finishing                   17981 non-null float64
Free kick accuracy          17981 non-null float64
GK diving                   17981 non-null float64
GK handling                 17981 non-null float64
GK kicking                  17981 non-null float64
GK positioning    

## Function 1

Build an algorithm that identifies the nth ranked (rank) defender in the world - sorted by 'Overall' then 'Name' (both descending order)
* Under a certain age (max_age)

In [50]:
### START FUNCTION 1

def best_defender_1(rank, max_age):
    # YOUR CODE HERE
    zb = rank-1
    return df[(df['Preferred Positions Type'] == 'Back') & (df.Age < max_age)]\
        .sort_values(by=['Overall', 'Name'], ascending=False).index[zb]
    raise NotImplementedError()

### END FUNCTION 1

In [51]:
print(best_defender_1(10, 35))

T. Alderweireld


In [52]:
best_defender_1(10, 35)

'T. Alderweireld'

## Function 2

Build an algorithm that identifies the nth ranked (rank) defender in the world - sorted by 'Overall' then 'Name' (both descending order)
* Under a certain age (max_age)
* Has an aggression score below a certain level (max_aggression)
* Has a stamina score above a certain level (min_stamina)

In [48]:
### START FUNCTION 2

def best_defender_2(rank, max_age, max_aggression, min_stamina):
    # YOUR CODE HERE
    zb = rank-1
    return df[(df['Preferred Positions Type'] == 'Back') & (df.Age < max_age) & (df.Aggression < max_aggression) & (df.Stamina > min_stamina)]\
        .sort_values(by=['Overall', 'Name'], ascending=False).index[zb]
    raise NotImplementedError()

### END FUNCTION 2

In [49]:
best_defender_2(10, 30, 80, 60)

'K. Walker'

In [53]:
best_defender_2(10, 30, 80, 60)

'K. Walker'

## Function 3

Build an algorithm that identifies the nth ranked (rank) defender in the world - sorted by 'Overall' then 'Name' (both descending order)
* Under a certain age
* Does not play for a certain team (team)

In [55]:
### START FUNCTION 3

def best_defender_3(rank, max_age, team):
    # YOUR CODE HERE
    zb = rank-1
    return df[(df['Preferred Positions Type'] == 'Back') & (df.Age < max_age) & (df.Nationality != team)]\
        .sort_values(by=['Overall', 'Name'], ascending=False).index[zb]
    raise NotImplementedError()

### END FUNCTION 3

In [56]:
best_defender_3(10, 25, 'Argentina')

'A. Rüdiger'

In [57]:
best_defender_3(10, 25, 'Argentina')

'A. Rüdiger'

## Function 4

Build an algorithm that identifies the nth ranked (rank) attacker in the world - sorted by 'Overall' then 'Name' (both descending order)
* With specified attribute (attribute_name) above a threshold (min_attribute_score)

In [84]:
### START FUNCTION 4

def best_attacker_1(rank, attribute_name, min_attribute_score):
    # YOUR CODE HERE
    zb = rank-1
    return df[(df[attribute_name] > min_attribute_score) & (df['Preferred Positions Type'] == 'Forward')]\
        .sort_values(by=['Overall', 'Name'], ascending=False).index[zb]
    raise NotImplementedError()

### END FUNCTION 4

In [82]:
best_attacker_1(10, 'Balance', 50)

'P. Aubameyang'

In [85]:
best_attacker_1(10, 'Balance', 50)

'P. Aubameyang'

## Function 5

Build an algorithm that identifies the nth ranked (rank) attacker in the world - sorted by 'Overall' then 'Name' (both descending order)
* With average of specified attributes (attribute_1_name, attribute_2_name) above a threshold (min_attributes_ave)

In [134]:
### START FUNCTION 5

def best_attacker_2(rank, attribute_1_name, attribute_2_name, min_attributes_ave):
    # YOUR CODE HERE
    zb = rank-1
    df['Averaged Columns'] = (df[attribute_1_name] + df[attribute_2_name]) / 2
    player = df[(df['Averaged Columns'] > min_attributes_ave) & (df['Preferred Positions Type'] == 'Forward')]\
        .sort_values(by=['Overall', 'Name'], ascending=False).index[zb]
    df.drop('Averaged Columns', axis=1, inplace=True)
    return player
    raise NotImplementedError()
### END FUNCTION 5

In [135]:
best_attacker_2(10, 'Finishing', 'Balance', 80)

'S. Mané'

In [136]:
best_attacker_2(10, 'Finishing', 'Balance', 80)

'S. Mané'

## Function 6

Build an algorithm that identifies the nth ranked (rank) attacker in the world - sorted by 'Overall' then 'Name' (both descending order)
* With minimum of specified attributes (attribute_1_name, attribute_2_name) above a threshold (min_attributes_min)

In [231]:
### START FUNCTION 6

def best_attacker_3(rank, attribute_1_name, attribute_2_name, min_attributes_min):
    
    # YOUR CODE HERE
    zb = rank-1
    player = df[(df[attribute_1_name] > min_attributes_min) & 
                (df[attribute_2_name] > min_attributes_min) &
                (df['Preferred Positions Type'] == 'Forward')]\
        .sort_values(by=['Overall', 'Name'], ascending=False).index[zb]
    return player
    raise NotImplementedError()
    
### END FUNCTION 6

In [232]:
best_attacker_3(10, 'Balance', 'Composure', 70)

'A. Di María'

In [233]:
best_attacker_3(10, 'Balance', 'Ball control', 70)

'A. Di María'

## Function 7

Build an algorithm that identifies the best n (no_defenders) defenders - sorted by 'Overall' then 'Name' (both descending order)
* From a certain country (country)
* Under a certain age (max_age)

Your function must return a `list` of `strings`

In [224]:
### START FUNCTION 7

def best_team_1(country, no_defenders, max_age):
    
    # YOUR CODE HERE
    return list(df[(df['Preferred Positions Type'] == 'Back') & 
                   (df.Nationality == country) & (df.Age < max_age)]\
        .sort_values(by=['Overall', 'Name'], ascending=False)\
        .index.values[0:no_defenders])
    raise NotImplementedError()
    
### END FUNCTION 7

In [225]:
best_team_1('England', 3, 30)

['K. Walker', 'N. Clyne', 'E. Dier']

In [226]:
best_team_1('England', 3, 30)

['K. Walker', 'N. Clyne', 'E. Dier']

## Function 8

Build an algorithm that identifies the best n (no_attackers) attackers - sorted by 'Overall' then 'Name' (both descending order)
* From a certain country (country)
* With a specified attribute (attribute name) above a threshold (min_attribute_score)

Your function must return a `list` of `strings`

In [227]:
### START FUNCTION 8

def best_team_2(country, no_attackers, attribute_name, min_attribute_score):
    
    # YOUR CODE HERE
    return list(df[(df.Nationality == country) &
                   (df[attribute_name] > min_attribute_score) &
                   (df['Preferred Positions Type'] == 'Forward')]\
        .sort_values(by=['Overall', 'Name'], ascending=False)\
        .index.values[0:no_attackers])
    raise NotImplementedError()
    
### END FUNCTION 8

In [228]:
best_team_2('England', 3, 'Finishing', 60)

['H. Kane', 'R. Sterling', 'D. Sturridge']

## Function 9

Build an algorithm that identifies the best team based on the team structure (no_attackers, no_defenders, no_midfielders, no_goalkeepers) - sorted by 'Overall' then 'Name' (both descending order)
* From a certain country (country)

Your function must return a `list` of `strings`

In [241]:
### START FUNCTION 9

def best_team_3(country, no_attackers, no_defenders, no_midfielders, no_goalkeepers):
    
    # YOUR CODE HERE
    team_country = df[df.Nationality == country]
    p1 = list(team_country[team_country['Preferred Positions Type'] == 'Forward']\
        .sort_values(by=['Overall', 'Name'], ascending=False).index.values[0:no_attackers])
    p2 = list(team_country[team_country['Preferred Positions Type'] == 'Back']\
        .sort_values(by=['Overall', 'Name'], ascending=False).index.values[0:no_defenders])
    p3 = list(team_country[team_country['Preferred Positions Type'] == 'Midfielder']\
        .sort_values(by=['Overall', 'Name'], ascending=False).index.values[0:no_midfielders])
    p4 = list(team_country[team_country['Preferred Positions Type'] == 'GoalKeeper']\
        .sort_values(by=['Overall', 'Name'], ascending=False).index.values[0:no_goalkeepers])
    return p1 + p2 + p3 + p4
    raise NotImplementedError()
    
### END FUNCTION 9

In [243]:
best_team_3('Brazil', 3, 3, 4, 1)

['Neymar',
 'Coutinho',
 'Willian',
 'Thiago Silva',
 'Marcelo',
 'Miranda',
 'Casemiro',
 'Oscar',
 'Fabinho',
 'Taison',
 'Ederson']

In [244]:
best_team_3('England', 3, 4, 3, 1)

['H. Kane',
 'R. Sterling',
 'D. Sturridge',
 'G. Cahill',
 'K. Walker',
 'N. Clyne',
 'L. Baines',
 'D. Alli',
 'A. Lallana',
 'J. Henderson',
 'J. Hart']