# Filtering on conditions
---

The aim of this is to learn about filtering information in larger datasets. We'll filter it based on a text value that we provide manually, and also on whether or not a value is greater or smalelr than anotehr one - wage bill, for example.

In [1]:
import numpy as np
import pandas as pd

Open the data set with pandas with the filepath of the csv.

Calling `.head()` will allow us to check out the structure.

In [2]:
# load data
data = pd.read_csv('players_20.csv')
data.head()

Unnamed: 0,sofifa_id,player_url,short_name,long_name,age,dob,height_cm,weight_kg,nationality,club,...,lwb,ldm,cdm,rdm,rwb,lb,lcb,cb,rcb,rb
0,158023,https://sofifa.com/player/158023/lionel-messi/...,L. Messi,Lionel Andrés Messi Cuccittini,32,1987-06-24,170,72,Argentina,FC Barcelona,...,68+2,66+2,66+2,66+2,68+2,63+2,52+2,52+2,52+2,63+2
1,20801,https://sofifa.com/player/20801/c-ronaldo-dos-...,Cristiano Ronaldo,Cristiano Ronaldo dos Santos Aveiro,34,1985-02-05,187,83,Portugal,Juventus,...,65+3,61+3,61+3,61+3,65+3,61+3,53+3,53+3,53+3,61+3
2,190871,https://sofifa.com/player/190871/neymar-da-sil...,Neymar Jr,Neymar da Silva Santos Junior,27,1992-02-05,175,68,Brazil,Paris Saint-Germain,...,66+3,61+3,61+3,61+3,66+3,61+3,46+3,46+3,46+3,61+3
3,200389,https://sofifa.com/player/200389/jan-oblak/20/...,J. Oblak,Jan Oblak,26,1993-01-07,188,87,Slovenia,Atlético Madrid,...,,,,,,,,,,
4,183277,https://sofifa.com/player/183277/eden-hazard/2...,E. Hazard,Eden Hazard,28,1991-01-07,175,74,Belgium,Real Madrid,...,66+3,63+3,63+3,63+3,66+3,61+3,49+3,49+3,49+3,61+3


This data set has over 18,000 players, so we'll select a team to save some money for. We'll create a function that filters the big database for am atch with a club name. There are lots of columns to the main dataset, so we're only going to extract the name, wage, value positions, ranking, and age of our players.

In [3]:
# define a function called club, that looks for a team name and returns the selected columns

def club(team_name):
  return data[data['club'] == team_name][['short_name', 'wage_eur', 'value_eur', 'player_positions', 'overall', 'age']]

# use the club function to find the team, and sort the squad by wage bill

club('Chelsea').sort_values('wage_eur', ascending=False)

Unnamed: 0,short_name,wage_eur,value_eur,player_positions,overall,age
15,N. Kanté,235000,66000000,"CDM, CM",89,28
144,Azpilicueta,145000,25500000,"RB, CB",84,29
199,Jorginho,140000,29000000,"CM, CDM",83,27
296,O. Giroud,140000,17500000,ST,82,32
298,Willian,140000,21000000,"RW, LW",82,30
309,Pedro,140000,19500000,"RW, LW",82,31
240,M. Kovačić,125000,29000000,CM,82,25
397,Marcos Alonso,115000,15000000,"LB, LWB",81,28
263,A. Rüdiger,115000,24000000,CB,82,26
630,M. Batshuayi,110000,16000000,ST,79,25


Now that we have all of our players, we need to find a way to take information from a player we may want to replace and use it to search the initial database.

Let's extract Jorginho's information from the database, assign his wage and rating to variables, and finally recommend a player who is as good as him, but has a cheaper wage.

First, let's get information and variables sorted.

In [5]:
# Extract Jorginho's information, just like we did with the team name before
jorgi = data[data['short_name'] == 'Jorginho'][['short_name', 'wage_eur', 'value_eur', 'player_positions', 'overall', 'age']]

# Assign jorginho's wage, position, rating, and age to variables

jorgi_wage = jorgi['wage_eur'].item()
jorgi_pos = jorgi['player_positions'].item()
jorgi_rating = jorgi['overall'].item()
jorgi_age = jorgi['age'].item()

Second, we will filter the initla database with these values. First, we need to create a longlist of players that match Jorginho's position. From there we need to slim the list down to meet the other criteria.

In [7]:
# create a list to match Jorginho's position

longlist = data[data['player_positions'] == jorgi_pos][['short_name', 'wage_eur', 'value_eur', 'player_positions', 'overall', 'age']]

# create a list of players that have a lower overall than jorginho

removals = longlist[longlist['overall'] <= jorgi_rating].index
longlist.drop(removals, inplace=True)

# now for players more expensive than jorginho
removals = longlist[longlist['wage_eur'] > jorgi_wage].index
longlist.drop(removals, inplace=True)

# then for older players
removals = longlist[longlist['age'] >= jorgi_age].index
longlist.drop(removals, inplace=True)

# then show potential replacements, sorted by lowest wages
longlist.sort_values('wage_eur')

Unnamed: 0,short_name,wage_eur,value_eur,player_positions,overall,age
58,M. Verratti,140000,54500000,"CM, CDM",86,26


We have one player that meets the criteria--Verratti!

How about we create a function so we can do this again and save time?

We'll add a second argument to the function that allows us to look at players rated lower than our own. We'll set it to 0 by default.

In [9]:
def cheap_replacement(player, skill_reduction = 0):

  #Get the replacee with the name provided in the argument

  replacee = data[data['short_name'] == player][['short_name', 'wage_eur', 'value_eur', 'player_positions', 'overall', 'age']]

  # Assign the relevant details of this player to variables

  replacee_pos = replacee['player_positions'].item()
  replacee_wage = replacee['wage_eur'].item()
  replacee_age = replacee['age'].item()
  replacee_overall = replacee['overall'].item() - skill_reduction

  # create a longlist of players that share the position

  longlist = data[data['player_positions'] == replacee_pos][['short_name', 'wage_eur', 'value_eur', 'player_positions', 'overall', 'age']]

  # removals for rating criteria
  removals = longlist[longlist['overall'] <= replacee_overall].index
  longlist.drop(removals, inplace=True)

  # removals for higher wages
  removals = longlist[longlist['wage_eur'] > replacee_wage].index
  longlist.drop(removals, inplace=True)

  # removals for older players
  removals = longlist[longlist['age'] >= replacee_age].index
  longlist.drop(removals, inplace=True)

  # Display players that meet requirements
  return longlist.sort_values('wage_eur')

In [11]:
cheap_replacement('Kepa')

Unnamed: 0,short_name,wage_eur,value_eur,player_positions,overall,age
74,G. Donnarumma,34000,41500000,GK,85,20


In [12]:
cheap_replacement('P. Pogba', 9)

Unnamed: 0,short_name,wage_eur,value_eur,player_positions,overall,age
472,Gabriel,17000,18000000,"CM, CDM",80,25
439,Sergi Darder,33000,19000000,"CM, CDM",80,25
416,Y. Tielemans,76000,20500000,"CM, CDM",80,22
441,L. Paredes,79000,19000000,"CM, CDM",80,25
420,H. Winks,82000,20000000,"CM, CDM",80,23
331,T. Ndombele,87000,26000000,"CM, CDM",81,22
243,N. Keïta,95000,29000000,"CM, CDM",82,24
166,C. Tolisso,110000,34000000,"CM, CDM",83,24
338,E. Can,110000,23000000,"CM, CDM",81,25
173,A. Rabiot,120000,33000000,"CM, CDM",83,24
