## Ejercicio manejo de datos y Limpieza

 - Importa los datos de FIFA.csv, Contienen información de jugadores de futbol
 - Realiza una limpieza de datos:
   - ¿De qué formato son nuestras variables? ¿cuántas líneas y columnas tiene el dataset?
   - Cambia las medidas de las variables Weight y Height a Kg y cm
   - Elimina el "+1", "+2"... de las variables CDM, RDM... y todas las columnas que están al final, sin tener que ir una por una. Apóyate en métodos como split, replace...
   - Arregla también las variables W/F, SM, IR
   - Para posteriormente hacer un buen análisis de datos por posición del jugador sería recomendable agrupar o normalizar las posiciones (BP, Best Position) en Portero, Defensa, Mediocampista y delantero. Quizás lo mejor sería crear una función posiciones_jugadores y dependiendo de la BP de cada uno que la nueva variable tome uno de esos 4 valores.
   - También vamos a querer calcular algunas métricas sobre el valor de los jugadores, Transforma las variables Value y Wage de manera que sea un entero y podamos calcular la suma y la media por equipo, posición, nacionalidad...
 - Tratamiento de valores nulos:
   - Localiza las variables que tienen valores nulos y cuenta cuántos tiene cada una de ellas
   - Decide qué hacer con ellos, si eliminarlos, imputarles un valor fijo, imputarles un valor en función de alguna condición...
 - Valores extremos
   - Localiza si las variables numéricas toman valores demasiado elevados o bajos, usa la función describe y pinta gráficos
   - Si existe alguno que sea absurdo (por ejemplo una altura en negativo) corrígelo. Si no, trátalo de la manera que más nos convenga para un futuro análisis exploratorio, por ejemplo limítalo a un valor más bajo.
 - Análisis descriptivo
   - Una vez tenemos nuestros datos limpios de irregularidades, de valores nulos y erróneos podemos emepezar a analizar su contenido
   - Compara las variables numéricas en función de variables categóricas, como su posición. Respóndete a preguntas tales como qué jugadores son más caros, más altos... Para ello representa una tabla con la suma, la media y la cuenta para cada posición. También dibuja un gráfico que consideres el más conveniente para analizar la misma información visualmente. 
   - OVA representa una valoración general del jugador, analiza si tiene alguna relación con otras columnas referentes a la calidad como Attacking, Defending... A mayor OVA mayor Value?

In [155]:
import pandas as pd

In [156]:
df_fifa = pd.read_csv('../Data/fifa.csv')

In [157]:
df_fifa.describe
#With the describe method we obtain information about the number of rows and columns.

<bound method NDFrame.describe of        Unnamed: 0      ID           Name  Age  Nationality  \
0            1954  184383      A. Pasche   26  Switzerland   
1            2225  188044  Alan Carvalho   30     China PR   
2            1959  184431    S. Giovinco   33        Italy   
3            9815  233796       J. Evans   22        Wales   
4           10074  234799     Y. Demoncy   23       France   
...           ...     ...            ...  ...          ...   
13695       11284  239074          S. Aw   21      Senegal   
13696       11964  241223        S. Mogi   21        Japan   
13697        5390  210930     Carles Gil   27        Spain   
13698         860  162993       J. Perch   34      England   
13699       15795  254882  A. Oeynhausen   18      Germany   

                             Club   BP   Position  \
0               FC Lausanne-Sport   CM     CM CDM   
1         Beijing Sinobo Guoan FC   ST   ST LW LM   
2                        Al Hilal  CAM     CAM CF   
3        

In [158]:
df_fifa.info(102)
#With the method we can obtain information about the type of variables 

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 13700 entries, 0 to 13699
Data columns (total 102 columns):
 #    Column            Dtype  
---   ------            -----  
 0    Unnamed: 0        int64  
 1    ID                int64  
 2    Name              object 
 3    Age               int64  
 4    Nationality       object 
 5    Club              object 
 6    BP                object 
 7    Position          object 
 8    Team & Contract   object 
 9    Height            object 
 10   Weight            object 
 11   foot              object 
 12   Growth            int64  
 13   Joined            object 
 14   Loan Date End     object 
 15   Value             object 
 16   Wage              object 
 17   Release Clause    object 
 18   Contract          object 
 19   Attacking         int64  
 20   Crossing          int64  
 21   Finishing         int64  
 22   Heading Accuracy  int64  
 23   Short Passing     int64  
 24   Volleys           float64
 25   Skill             in

In [159]:
df_fifa[['Height']]

Unnamed: 0,Height
0,"5'9"""
1,"6'0"""
2,"5'4"""
3,"5'10"""
4,"5'11"""
...,...
13695,"5'8"""
13696,"6'5"""
13697,"5'7"""
13698,"5'11"""


In [160]:
df_fifa[['Weight']]

Unnamed: 0,Weight
0,161lbs
1,159lbs
2,134lbs
3,152lbs
4,150lbs
...,...
13695,143lbs
13696,176lbs
13697,146lbs
13698,176lbs


In [161]:
df_fifa[['Height','Weight']]

Unnamed: 0,Height,Weight
0,"5'9""",161lbs
1,"6'0""",159lbs
2,"5'4""",134lbs
3,"5'10""",152lbs
4,"5'11""",150lbs
...,...,...
13695,"5'8""",143lbs
13696,"6'5""",176lbs
13697,"5'7""",146lbs
13698,"5'11""",176lbs


In [162]:
import re

def height_to_cm(height):
    match = re.match(r"(\d+)'(\d+)", height) 
    if match:
        foots = int(match.group(1))
        inches = int(match.group(2))
        cms = foots * 30.48 + inches * 2.54 
        return(cms)
    return None

df_fifa['Height'] = df_fifa['Height'].apply(height_to_cm)

df_fifa[['Height','Weight']]


    




Unnamed: 0,Height,Weight
0,175.26,161lbs
1,182.88,159lbs
2,162.56,134lbs
3,177.80,152lbs
4,180.34,150lbs
...,...,...
13695,172.72,143lbs
13696,195.58,176lbs
13697,170.18,146lbs
13698,180.34,176lbs


In [163]:
import re

def weight_to_kg(weight):
    match = re.match(r"(\d+)lbs", weight)  # Extraer el número
    if match:
        lbs = int(match.group(1))
        return round(lbs * 0.453592, 2)  # Conversión
    return None

# Aplicar la función
df_fifa['Weight'] = df_fifa['Weight'].apply(weight_to_kg)

df_fifa[['Height','Weight']]

Unnamed: 0,Height,Weight
0,175.26,73.03
1,182.88,72.12
2,162.56,60.78
3,177.80,68.95
4,180.34,68.04
...,...,...
13695,172.72,64.86
13696,195.58,79.83
13697,170.18,66.22
13698,180.34,79.83


In [164]:
df_fifa.head()

Unnamed: 0.1,Unnamed: 0,ID,Name,Age,Nationality,Club,BP,Position,Team & Contract,Height,Weight,foot,Growth,Joined,Loan Date End,Value,Wage,Release Clause,Contract,Attacking,Crossing,Finishing,Heading Accuracy,Short Passing,Volleys,Skill,Dribbling,Curve,FK Accuracy,Long Passing,Ball Control,Movement,Acceleration,Sprint Speed,Agility,Reactions,Balance,Power,Shot Power,Jumping,Stamina,Strength,Long Shots,Mentality,Aggression,Interceptions,Positioning,Vision,Penalties,Composure,Defending,Marking,Standing Tackle,Sliding Tackle,Goalkeeping,GK Diving,GK Handling,GK Kicking,GK Positioning,GK Reflexes,Total Stats,Base Stats,W/F,SM,A/W,D/W,IR,PAC,SHO,PAS,DRI,DEF,PHY,Hits,LS,ST,RS,LW,LF,CF,RF,RW,LAM,CAM,RAM,LM,LCM,CM,RCM,RM,LWB,LDM,CDM,RDM,RWB,LB,LCB,CB,RCB,RB,GK,OVA
0,1954,184383,A. Pasche,26,Switzerland,FC Lausanne-Sport,CM,CM CDM,FC Lausanne-Sport 2015 ~ 2020,175.26,73.03,Right,1,"Jul 1, 2015",,€525K,€4K,€801K,2015 ~ 2020,258,54,47,43,70,44.0,286,61,44.0,55,63,63,346,64,73,61.0,66,82.0,306,62,73.0,71,55,45,290,54,52.0,62.0,68.0,54,54.0,148,49,56,43.0,48,7,12,14,9,6,1682,357,4 ★,2★,High,Medium,1 ★,69,51,63,63,51,60,3,58+1,58+1,58+1,61+0,62+0,62+0,62+0,61+0,63+1,63+1,63+1,63+1,63+1,63+1,63+1,63+1,59+1,59+1,59+1,59+1,59+1,58+1,54+1,54+1,54+1,58+1,15+1,64
1,2225,188044,Alan Carvalho,30,China PR,Beijing Sinobo Guoan FC,ST,ST LW LM,"Beijing Sinobo Guoan FC Dec 31, 2020 On Loan",182.88,72.12,Right,0,"Jan 16, 2015","Dec 31, 2020",€8.5M,€23K,€0,"Dec 31, 2020 On Loan",365,66,79,76,68,76.0,375,83,78.0,72,63,79,404,83,83,88.0,75,75.0,372,74,81.0,75,74,68,313,54,33.0,78.0,72.0,76,70.0,77,35,20,22.0,55,11,7,14,7,16,1961,412,3 ★,4★,High,Low,2 ★,83,75,68,82,33,71,44,77+0,77+0,77+0,77+0,77+0,77+0,77+0,77+0,76+1,76+1,76+1,76+1,68+2,68+2,68+2,76+1,57+2,53+2,53+2,53+2,57+2,53+2,48+2,48+2,48+2,53+2,18+2,77
2,1959,184431,S. Giovinco,33,Italy,Al Hilal,CAM,CAM CF,Al Hilal 2019 ~ 2022,162.56,60.78,Right,0,"Jan 31, 2019",,€9M,€49K,€15.3M,2019 ~ 2022,336,73,76,34,78,75.0,424,85,89.0,91,74,85,424,84,76,93.0,78,93.0,308,79,34.0,75,42,78,332,75,26.0,80.0,78.0,73,82.0,80,23,29,28.0,21,6,3,6,3,3,1925,404,4 ★,4★,High,Medium,2 ★,80,77,78,86,27,56,73,73+2,73+2,73+2,80+0,79+0,79+0,79+0,80+0,80+0,80+0,80+0,79+1,74+2,74+2,74+2,79+1,59+2,56+2,56+2,56+2,59+2,53+2,41+2,41+2,41+2,53+2,12+2,80
3,9815,233796,J. Evans,22,Wales,Swansea City,CDM,CDM CM,Swansea City 2016 ~ 2021,177.8,68.95,Right,13,"Jul 1, 2016",,€275K,€4K,€694K,2016 ~ 2021,242,44,42,58,62,36.0,259,54,41.0,46,57,61,282,54,59,59.0,55,55.0,277,57,60.0,64,58,38,257,61,57.0,31.0,54.0,54,48.0,168,55,58,55.0,42,8,9,6,7,12,1527,329,2 ★,2★,Medium,Medium,1 ★,57,44,54,57,57,60,7,50+2,50+2,50+2,51+0,51+0,51+0,51+0,51+0,53+2,53+2,53+2,53+2,56+2,56+2,56+2,53+2,56+2,58+2,58+2,58+2,56+2,57+2,58+2,58+2,58+2,57+2,14+2,59
4,10074,234799,Y. Demoncy,23,France,US Orléans Loiret Football,CDM,CDM CM,US Orléans Loiret Football 2018 ~ 2021,180.34,68.04,Right,8,"Jul 1, 2018",,€725K,€2K,€1.4M,2018 ~ 2021,249,49,37,61,68,34.0,280,64,44.0,45,61,66,324,66,66,61.0,62,69.0,280,61,34.0,81,61,43,294,66,60.0,55.0,64.0,49,58.0,185,58,61,66.0,52,8,9,15,5,15,1664,360,2 ★,3★,Low,Medium,1 ★,66,44,60,64,60,66,4,56+2,56+2,56+2,59+0,59+0,59+0,59+0,59+0,61+2,61+2,61+2,62+2,63+2,63+2,63+2,62+2,64+2,64+2,64+2,64+2,64+2,63+2,61+2,61+2,61+2,63+2,15+2,65


In [165]:
df_fifa = df_fifa.replace(r'\+\d+', '', regex=True)

# Ver las primeras filas para confirmar el cambio
df_fifa.head()

Unnamed: 0.1,Unnamed: 0,ID,Name,Age,Nationality,Club,BP,Position,Team & Contract,Height,Weight,foot,Growth,Joined,Loan Date End,Value,Wage,Release Clause,Contract,Attacking,Crossing,Finishing,Heading Accuracy,Short Passing,Volleys,Skill,Dribbling,Curve,FK Accuracy,Long Passing,Ball Control,Movement,Acceleration,Sprint Speed,Agility,Reactions,Balance,Power,Shot Power,Jumping,Stamina,Strength,Long Shots,Mentality,Aggression,Interceptions,Positioning,Vision,Penalties,Composure,Defending,Marking,Standing Tackle,Sliding Tackle,Goalkeeping,GK Diving,GK Handling,GK Kicking,GK Positioning,GK Reflexes,Total Stats,Base Stats,W/F,SM,A/W,D/W,IR,PAC,SHO,PAS,DRI,DEF,PHY,Hits,LS,ST,RS,LW,LF,CF,RF,RW,LAM,CAM,RAM,LM,LCM,CM,RCM,RM,LWB,LDM,CDM,RDM,RWB,LB,LCB,CB,RCB,RB,GK,OVA
0,1954,184383,A. Pasche,26,Switzerland,FC Lausanne-Sport,CM,CM CDM,FC Lausanne-Sport 2015 ~ 2020,175.26,73.03,Right,1,"Jul 1, 2015",,€525K,€4K,€801K,2015 ~ 2020,258,54,47,43,70,44.0,286,61,44.0,55,63,63,346,64,73,61.0,66,82.0,306,62,73.0,71,55,45,290,54,52.0,62.0,68.0,54,54.0,148,49,56,43.0,48,7,12,14,9,6,1682,357,4 ★,2★,High,Medium,1 ★,69,51,63,63,51,60,3,58,58,58,61,62,62,62,61,63,63,63,63,63,63,63,63,59,59,59,59,59,58,54,54,54,58,15,64
1,2225,188044,Alan Carvalho,30,China PR,Beijing Sinobo Guoan FC,ST,ST LW LM,"Beijing Sinobo Guoan FC Dec 31, 2020 On Loan",182.88,72.12,Right,0,"Jan 16, 2015","Dec 31, 2020",€8.5M,€23K,€0,"Dec 31, 2020 On Loan",365,66,79,76,68,76.0,375,83,78.0,72,63,79,404,83,83,88.0,75,75.0,372,74,81.0,75,74,68,313,54,33.0,78.0,72.0,76,70.0,77,35,20,22.0,55,11,7,14,7,16,1961,412,3 ★,4★,High,Low,2 ★,83,75,68,82,33,71,44,77,77,77,77,77,77,77,77,76,76,76,76,68,68,68,76,57,53,53,53,57,53,48,48,48,53,18,77
2,1959,184431,S. Giovinco,33,Italy,Al Hilal,CAM,CAM CF,Al Hilal 2019 ~ 2022,162.56,60.78,Right,0,"Jan 31, 2019",,€9M,€49K,€15.3M,2019 ~ 2022,336,73,76,34,78,75.0,424,85,89.0,91,74,85,424,84,76,93.0,78,93.0,308,79,34.0,75,42,78,332,75,26.0,80.0,78.0,73,82.0,80,23,29,28.0,21,6,3,6,3,3,1925,404,4 ★,4★,High,Medium,2 ★,80,77,78,86,27,56,73,73,73,73,80,79,79,79,80,80,80,80,79,74,74,74,79,59,56,56,56,59,53,41,41,41,53,12,80
3,9815,233796,J. Evans,22,Wales,Swansea City,CDM,CDM CM,Swansea City 2016 ~ 2021,177.8,68.95,Right,13,"Jul 1, 2016",,€275K,€4K,€694K,2016 ~ 2021,242,44,42,58,62,36.0,259,54,41.0,46,57,61,282,54,59,59.0,55,55.0,277,57,60.0,64,58,38,257,61,57.0,31.0,54.0,54,48.0,168,55,58,55.0,42,8,9,6,7,12,1527,329,2 ★,2★,Medium,Medium,1 ★,57,44,54,57,57,60,7,50,50,50,51,51,51,51,51,53,53,53,53,56,56,56,53,56,58,58,58,56,57,58,58,58,57,14,59
4,10074,234799,Y. Demoncy,23,France,US Orléans Loiret Football,CDM,CDM CM,US Orléans Loiret Football 2018 ~ 2021,180.34,68.04,Right,8,"Jul 1, 2018",,€725K,€2K,€1.4M,2018 ~ 2021,249,49,37,61,68,34.0,280,64,44.0,45,61,66,324,66,66,61.0,62,69.0,280,61,34.0,81,61,43,294,66,60.0,55.0,64.0,49,58.0,185,58,61,66.0,52,8,9,15,5,15,1664,360,2 ★,3★,Low,Medium,1 ★,66,44,60,64,60,66,4,56,56,56,59,59,59,59,59,61,61,61,62,63,63,63,62,64,64,64,64,64,63,61,61,61,63,15,65


In [166]:
print(df_fifa[['W/F', 'SM', 'IR']].head())


   W/F  SM   IR
0  4 ★  2★  1 ★
1  3 ★  4★  2 ★
2  4 ★  4★  2 ★
3  2 ★  2★  1 ★
4  2 ★  3★  1 ★


In [167]:
df_fifa.replace({' ★': '', '★': ''}, regex=True, inplace=True)
print(df_fifa[['W/F', 'SM', 'IR']].head())  

  W/F SM IR
0   4  2  1
1   3  4  2
2   4  4  2
3   2  2  1
4   2  3  1


In [168]:
print(df_fifa['W/F'].unique())
print(df_fifa['SM'].unique())   
print(df_fifa['IR'].unique())   

['4' '3' '2' '5' '1']
['2' '4' '3' '1' '5']
['1' '2' '3' '5' '4']


In [169]:
for i, col in enumerate(df_fifa.columns):
    print(f"{i}: {col}")


0: Unnamed: 0
1: ID
2: Name
3: Age
4: Nationality
5: Club
6: BP
7: Position
8: Team & Contract
9: Height
10: Weight
11: foot
12: Growth
13: Joined
14: Loan Date End
15: Value
16: Wage
17: Release Clause
18: Contract
19: Attacking
20: Crossing
21: Finishing
22: Heading Accuracy
23: Short Passing
24: Volleys
25: Skill
26: Dribbling
27: Curve
28: FK Accuracy
29: Long Passing
30: Ball Control
31: Movement
32: Acceleration
33: Sprint Speed
34: Agility
35: Reactions
36: Balance
37: Power
38: Shot Power
39: Jumping
40: Stamina
41: Strength
42: Long Shots
43: Mentality
44: Aggression
45: Interceptions
46: Positioning
47: Vision
48: Penalties
49: Composure
50: Defending
51: Marking
52: Standing Tackle
53: Sliding Tackle
54: Goalkeeping
55: GK Diving
56: GK Handling
57: GK Kicking
58: GK Positioning
59: GK Reflexes
60: Total Stats
61: Base Stats
62: W/F
63: SM
64: A/W
65: D/W
66: IR
67: PAC
68: SHO
69: PAS
70: DRI
71: DEF
72: PHY
73: Hits
74: LS
75: ST
76: RS
77: LW
78: LF
79: CF
80: RF
81: RW
8

In [170]:
print(df_fifa[['BP']].head())


    BP
0   CM
1   ST
2  CAM
3  CDM
4  CDM


In [171]:
def posiciones_jugadores(bp):
    """Asigna una posición general basada en la mejor posición del jugador (BP)."""
    posiciones = {
        'GK': 'Portero',
        'RB': 'Defensa', 'CB': 'Defensa', 'LB': 'Defensa',
        'RWB': 'Defensa', 'LWB': 'Defensa',
        'CM': 'Mediocampista', 'CAM': 'Mediocampista', 'CDM': 'Mediocampista',
        'LM': 'Mediocampista', 'RM': 'Mediocampista',
        'ST': 'Delantero', 'RW': 'Delantero', 'LW': 'Delantero', 'CF': 'Delantero'
    }
    return posiciones.get(bp, 'Desconocido')  # Si BP no está en el diccionario, asigna 'Desconocido'
df_fifa['Posición General'] = df_fifa['BP'].apply(posiciones_jugadores)

print(df_fifa[['BP', 'Posición General']].head(10))
print(df_fifa['Posición General'].value_counts())  # Contar cuántos jugadores hay en cada categoría




    BP Posición General
0   CM    Mediocampista
1   ST        Delantero
2  CAM    Mediocampista
3  CDM    Mediocampista
4  CDM    Mediocampista
5   LM    Mediocampista
6   RB          Defensa
7   LM    Mediocampista
8   CB          Defensa
9   CB          Defensa
Posición General
Mediocampista    5359
Defensa          4405
Delantero        2655
Portero          1281
Name: count, dtype: int64


In [172]:
print(df_fifa['Posición General'].unique())


['Mediocampista' 'Delantero' 'Defensa' 'Portero']


In [173]:
print(df_fifa['Value'].unique())
print(df_fifa['Wage'].unique())


['€525K' '€8.5M' '€9M' '€275K' '€725K' '€1.4M' '€6M' '€300K' '€210K'
 '€220K' '€625K' '€650K' '€800K' '€5M' '€130K' '€1.1M' '€325K' '€825K'
 '€500K' '€350K' '€1M' '€475K' '€160K' '€675K' '€14.5M' '€900K' '€0'
 '€425K' '€450K' '€60K' '€10M' '€1.8M' '€750K' '€700K' '€1.5M' '€16.5M'
 '€6.5M' '€180K' '€2.6M' '€41M' '€600K' '€200K' '€3.2M' '€44M' '€2.7M'
 '€4.5M' '€875K' '€7.5M' '€110K' '€11.5M' '€975K' '€2.3M' '€550K' '€170K'
 '€850K' '€925K' '€775K' '€80K' '€3.3M' '€575K' '€375K' '€1.3M' '€1.2M'
 '€3.5M' '€1.9M' '€190K' '€250K' '€19.5M' '€100K' '€90K' '€5.5M' '€4.2M'
 '€140K' '€400K' '€2.9M' '€4.4M' '€2M' '€240K' '€4.6M' '€230K' '€7M'
 '€2.8M' '€13.5M' '€4.3M' '€8M' '€4.8M' '€20.5M' '€150K' '€9.5M' '€2.1M'
 '€120K' '€3.6M' '€15.5M' '€12.5M' '€3.4M' '€2.5M' '€1.7M' '€41.5M'
 '€1.6M' '€2.4M' '€13M' '€14M' '€3.9M' '€3M' '€12M' '€15M' '€950K' '€4M'
 '€10.5M' '€3.8M' '€70K' '€2.2M' '€28.5M' '€50K' '€25K' '€3.1M' '€23M'
 '€30M' '€4.9M' '€34.5M' '€31.5M' '€25.5M' '€4.7M' '€18.5M' '€42M' '€34M'
 

In [174]:
import pandas as pd

# Función para convertir valores monetarios en enteros
def convertir_valores(valor):
    if isinstance(valor, str):
        valor = valor.replace('€', '')
        if 'M' in valor:
            return int(float(valor.replace('M', '')) * 1_000_000)
        elif 'K' in valor:
            return int(float(valor.replace('K', '')) * 1_000)
        else:
            return int(valor)
    return 0

# Aplicar la conversión a las columnas Value y Wage
df_fifa['Value'] = df_fifa['Value'].apply(convertir_valores)
df_fifa['Wage'] = df_fifa['Wage'].apply(convertir_valores)

# Verificar la transformación
print(df_fifa[['Value', 'Wage']].head(10))
print(df_fifa.dtypes)



     Value   Wage
0   525000   4000
1  8500000  23000
2  9000000  49000
3   275000   4000
4   725000   2000
5  1400000  11000
6  6000000  28000
7   300000    800
8   210000   2000
9   220000   4000
Unnamed: 0           int64
ID                   int64
Name                object
Age                  int64
Nationality         object
                     ...  
RCB                 object
RB                  object
GK                  object
OVA                  int64
Posición General    object
Length: 103, dtype: object


In [175]:
print(df_fifa.isnull().sum()[df_fifa.isnull().sum() > 0])


Club                 21
Position            343
Joined               44
Loan Date End     12961
Volleys              44
Curve                44
Agility              44
Balance              44
Jumping              44
Interceptions         7
Positioning           7
Vision               44
Composure           329
Sliding Tackle       44
A/W                  67
D/W                  67
dtype: int64


In [176]:
print(df_fifa['Interceptions'].unique())


[52. 33. 26. 57. 60. 38. 77. 69. 67. 70. 20. 56. 25. 16. 68. 21. 65.  8.
 18. 22. 35. 17. 41. 63. 23. 29. 40. 32. 42. 15. 19. 59. 30. 74. 50. 27.
 72. 81. 13. 62. 31. 86. 43. 64. 10. 37. 28. 51. 39. 44. 53. 66. 79. 73.
 11. 61. 75. 78. 48. 55. 49. 24. 76. 58. 45. 71. 14. 84. 54. 36.  7. 47.
 12. 34.  6. 85. 80. 82.  9. 46. 89. 90. 83. 87. nan 91.  5. 88. 95.  4.]


In [177]:
# 1️⃣ Imputar "Sin Club" en jugadores sin equipo
df_fifa['Club'] = df_fifa['Club'].fillna('Sin Club')

# 2️⃣ Imputar "Unknown" en posiciones vacías (o eliminar si es necesario)
df_fifa['Position'] = df_fifa['Position'].fillna('Unknown')

# 3️⃣ Imputar "Desconocido" en la fecha de ingreso
df_fifa['Joined'] = df_fifa['Joined'].fillna('Desconocido')

# 4️⃣ Eliminar la columna "Loan Date End" (casi todos son nulos)
df_fifa = df_fifa.drop(columns=['Loan Date End'])

# 5️⃣ Imputar atributos técnicos con la media
cols_media = ['Volleys', 'Curve', 'Agility', 'Balance', 'Jumping', 'Vision', 'Composure', 'Sliding Tackle']
df_fifa[cols_media] = df_fifa[cols_media].apply(lambda x: x.fillna(x.mean()))

# 6️⃣ Imputar "Interceptions" y "Positioning" con la media por posición
df_fifa['Interceptions'] = df_fifa.groupby('Posición General')['Interceptions'].transform(lambda x: x.fillna(x.mean()))
df_fifa['Positioning'] = df_fifa.groupby('Posición General')['Positioning'].transform(lambda x: x.fillna(x.mean()))

# 7️⃣ Imputar "A/W" y "D/W" con la moda (valor más común)
df_fifa['A/W'] = df_fifa['A/W'].fillna(df_fifa['A/W'].mode()[0])
df_fifa['D/W'] = df_fifa['D/W'].fillna(df_fifa['D/W'].mode()[0])

# Verificar si quedan valores nulos
print(df_fifa.isnull().sum())


Unnamed: 0          0
ID                  0
Name                0
Age                 0
Nationality         0
                   ..
RCB                 0
RB                  0
GK                  0
OVA                 0
Posición General    0
Length: 102, dtype: int64


In [178]:
print(df_fifa.isnull().sum()[df_fifa.isnull().sum() > 0])


Series([], dtype: int64)


In [179]:
print(df_fifa.describe())


         Unnamed: 0             ID           Age        Height        Weight  \
count  13700.000000   13700.000000  13700.000000  13700.000000  13700.000000   
mean    8561.641971  219378.929708     25.286058    181.225848     75.133339   
std     4954.144383   37445.343736      4.944498      6.893956      7.171942   
min        1.000000      16.000000     16.000000    154.940000     53.070000   
25%     4269.750000  203979.750000     21.000000    175.260000     69.850000   
50%     8555.500000  228927.000000     25.000000    180.340000     74.840000   
75%    12887.250000  244043.500000     29.000000    185.420000     79.830000   
max    17122.000000  259091.000000     47.000000    205.740000    110.220000   

             Growth         Value           Wage     Attacking      Crossing  \
count  13700.000000  1.370000e+04   13700.000000  13700.000000  13700.000000   
mean       5.509927  2.549592e+06   10442.233577    258.313431     51.585766   
std        5.797176  5.294546e+06   210

In [None]:
import seaborn as sns

plt.figure(figsize=(12, 6))
sns.boxplot(data=df_fifa.select_dtypes(include=['number']))
plt.xticks(rotation=90)
plt.title("Boxplot de variables numéricas (Outliers)")
plt.show()



In [181]:
print(df_fifa.dtypes)



Unnamed: 0           int64
ID                   int64
Name                object
Age                  int64
Nationality         object
                     ...  
RCB                 object
RB                  object
GK                  object
OVA                  int64
Posición General    object
Length: 102, dtype: object


In [185]:
for i, col in enumerate(df_fifa.columns):
    print(f"{i}: {col} - {df_fifa[col].dtype}")




0: Unnamed: 0 - int64
1: ID - int64
2: Name - object
3: Age - int64
4: Nationality - object
5: Club - object
6: BP - object
7: Position - object
8: Team & Contract - object
9: Height - float64
10: Weight - float64
11: foot - object
12: Growth - int64
13: Joined - object
14: Value - int64
15: Wage - int64
16: Release Clause - object
17: Contract - object
18: Attacking - int64
19: Crossing - int64
20: Finishing - int64
21: Heading Accuracy - int64
22: Short Passing - int64
23: Volleys - float64
24: Skill - int64
25: Dribbling - int64
26: Curve - float64
27: FK Accuracy - int64
28: Long Passing - int64
29: Ball Control - int64
30: Movement - int64
31: Acceleration - int64
32: Sprint Speed - int64
33: Agility - float64
34: Reactions - int64
35: Balance - float64
36: Power - int64
37: Shot Power - int64
38: Jumping - float64
39: Stamina - int64
40: Strength - int64
41: Long Shots - int64
42: Mentality - int64
43: Aggression - int64
44: Interceptions - float64
45: Positioning - float64
46: V