## Calidad de los Datos
1. Limpieza de los datos del dataset Fifa21 en Python con Pandas.
    - Eliminar las columnas que no aportan datos útiles como la URL de la foto y la URL de la fuente.
    - Dar formato a los datos de las columnas Club, Hits, Weight, Height, Joined, Value, Release Clause y Wage.
    - Analice cuál es el mejor formato de datos para cada columna y aplique las transformaciones necesarias.
    - Considere la posibilidad de modificar el nombre de la columna.
2. Identificar si el dataset tiene valores duplicados. ¿Es necesario eliminarlos?
3. Investigar las siguientes técnicas para mejorar un dataset con valores faltantes (missing
values):
    - Descarte o listwise deletion.
    - Imputación simple.
    - Imputación múltiple.
    - Interpolación.
    - Modelado predictivo.
4. Identificar los valores faltantes de la columna Hits y completar la columna utilizando algunas de las técnicas mencionadas.
5. Basado en la columna Value, extraer en un nuevo archivo .csv una muestra con el percentil 25% de los jugadores más valiosos de la FIFA en 2021.

Importar pandas

In [11]:
import pandas as pd
import re
import numpy as np

Lectura de datos

In [12]:
fifa_df = pd.read_csv("./fifa21.csv")
fifa_df.head()

Unnamed: 0,ID,Name,LongName,photoUrl,playerUrl,Nationality,Age,Club,Contract,Positions,...,Loan Date End,Value,Wage,Release Clause,W/F,SM,A/W,D/W,IR,Hits
0,158023,L. Messi,Lionel Messi,https://cdn.sofifa.com/players/158/023/21_60.png,http://sofifa.com/player/158023/lionel-messi/2...,Argentina,33,\n\n\n\nFC Barcelona,2004 ~ 2021,"RW, ST, CF",...,,€103.5M,€560K,€138.4M,4 ★,4★,Medium,Low,5 ★,771
1,20801,Cristiano Ronaldo,C. Ronaldo dos Santos Aveiro,https://cdn.sofifa.com/players/020/801/21_60.png,http://sofifa.com/player/20801/c-ronaldo-dos-s...,Portugal,35,\n\n\n\nJuventus,2018 ~ 2022,"ST, LW",...,,€63M,€220K,€75.9M,4 ★,5★,High,Low,5 ★,562
2,200389,J. Oblak,Jan Oblak,https://cdn.sofifa.com/players/200/389/21_60.png,http://sofifa.com/player/200389/jan-oblak/210006/,Slovenia,27,\n\n\n\nAtlético Madrid,2014 ~ 2023,GK,...,,€120M,€125K,€159.4M,3 ★,1★,Medium,Medium,3 ★,150
3,192985,K. De Bruyne,Kevin De Bruyne,https://cdn.sofifa.com/players/192/985/21_60.png,http://sofifa.com/player/192985/kevin-de-bruyn...,Belgium,29,\n\n\n\nManchester City,2015 ~ 2023,"CAM, CM",...,,€129M,€370K,€161M,5 ★,4★,High,High,4 ★,207
4,190871,Neymar Jr,Neymar da Silva Santos Jr.,https://cdn.sofifa.com/players/190/871/21_60.png,http://sofifa.com/player/190871/neymar-da-silv...,Brazil,28,\n\n\n\nParis Saint-Germain,2017 ~ 2022,"LW, CAM",...,,€132M,€270K,€166.5M,5 ★,5★,High,Medium,5 ★,595


In [13]:
fifa_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 18979 entries, 0 to 18978
Data columns (total 25 columns):
 #   Column          Non-Null Count  Dtype 
---  ------          --------------  ----- 
 0   ID              18979 non-null  int64 
 1   Name            18979 non-null  object
 2   LongName        18979 non-null  object
 3   photoUrl        18979 non-null  object
 4   playerUrl       18979 non-null  object
 5   Nationality     18979 non-null  object
 6   Age             18979 non-null  int64 
 7   Club            18979 non-null  object
 8   Contract        18979 non-null  object
 9   Positions       18979 non-null  object
 10  Height          18979 non-null  object
 11  Weight          18979 non-null  object
 12  Preferred Foot  18979 non-null  object
 13  Best Position   18979 non-null  object
 14  Joined          18979 non-null  object
 15  Loan Date End   1013 non-null   object
 16  Value           18979 non-null  object
 17  Wage            18979 non-null  object
 18  Releas

Establecer columna id en el mismo objeto

In [14]:
fifa_df.set_index("ID", inplace=True)

In [15]:
fifa_df

Unnamed: 0_level_0,Name,LongName,photoUrl,playerUrl,Nationality,Age,Club,Contract,Positions,Height,...,Loan Date End,Value,Wage,Release Clause,W/F,SM,A/W,D/W,IR,Hits
ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
158023,L. Messi,Lionel Messi,https://cdn.sofifa.com/players/158/023/21_60.png,http://sofifa.com/player/158023/lionel-messi/2...,Argentina,33,\n\n\n\nFC Barcelona,2004 ~ 2021,"RW, ST, CF",170cm,...,,€103.5M,€560K,€138.4M,4 ★,4★,Medium,Low,5 ★,771
20801,Cristiano Ronaldo,C. Ronaldo dos Santos Aveiro,https://cdn.sofifa.com/players/020/801/21_60.png,http://sofifa.com/player/20801/c-ronaldo-dos-s...,Portugal,35,\n\n\n\nJuventus,2018 ~ 2022,"ST, LW",187cm,...,,€63M,€220K,€75.9M,4 ★,5★,High,Low,5 ★,562
200389,J. Oblak,Jan Oblak,https://cdn.sofifa.com/players/200/389/21_60.png,http://sofifa.com/player/200389/jan-oblak/210006/,Slovenia,27,\n\n\n\nAtlético Madrid,2014 ~ 2023,GK,188cm,...,,€120M,€125K,€159.4M,3 ★,1★,Medium,Medium,3 ★,150
192985,K. De Bruyne,Kevin De Bruyne,https://cdn.sofifa.com/players/192/985/21_60.png,http://sofifa.com/player/192985/kevin-de-bruyn...,Belgium,29,\n\n\n\nManchester City,2015 ~ 2023,"CAM, CM",181cm,...,,€129M,€370K,€161M,5 ★,4★,High,High,4 ★,207
190871,Neymar Jr,Neymar da Silva Santos Jr.,https://cdn.sofifa.com/players/190/871/21_60.png,http://sofifa.com/player/190871/neymar-da-silv...,Brazil,28,\n\n\n\nParis Saint-Germain,2017 ~ 2022,"LW, CAM",175cm,...,,€132M,€270K,€166.5M,5 ★,5★,High,Medium,5 ★,595
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
247223,Xia Ao,Ao Xia,https://cdn.sofifa.com/players/247/223/21_60.png,http://sofifa.com/player/247223/ao-xia/210006/,China PR,21,\n\n\n\nWuhan Zall,2018 ~ 2022,CB,178cm,...,,€100K,€1K,€70K,2 ★,2★,Medium,Medium,1 ★,
258760,B. Hough,Ben Hough,https://cdn.sofifa.com/players/258/760/21_60.png,http://sofifa.com/player/258760/ben-hough/210006/,England,17,\n\n\n\nOldham Athletic,2020 ~ 2021,CM,175cm,...,,€130K,€500,€165K,2 ★,2★,Medium,Medium,1 ★,
252757,R. McKinley,Ronan McKinley,https://cdn.sofifa.com/players/252/757/21_60.png,http://sofifa.com/player/252757/ronan-mckinley...,England,18,\n\n\n\nDerry City,2019 ~ 2020,CM,179cm,...,,€120K,€500,€131K,2 ★,2★,Medium,Medium,1 ★,
243790,Wang Zhen'ao,Zhen'ao Wang,https://cdn.sofifa.com/players/243/790/21_60.png,http://sofifa.com/player/243790/zhenao-wang/21...,China PR,20,\n\n\n\nDalian YiFang FC,2020 ~ 2022,RW,175cm,...,,€100K,€2K,€88K,3 ★,2★,Medium,Medium,1 ★,


Eliminación de las columnas photoUrl y playerUrl

In [16]:
fifa_df_cleaned = fifa_df.drop(["photoUrl", "playerUrl"], axis=1)
fifa_df_cleaned

Unnamed: 0_level_0,Name,LongName,Nationality,Age,Club,Contract,Positions,Height,Weight,Preferred Foot,...,Loan Date End,Value,Wage,Release Clause,W/F,SM,A/W,D/W,IR,Hits
ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
158023,L. Messi,Lionel Messi,Argentina,33,\n\n\n\nFC Barcelona,2004 ~ 2021,"RW, ST, CF",170cm,72kg,Left,...,,€103.5M,€560K,€138.4M,4 ★,4★,Medium,Low,5 ★,771
20801,Cristiano Ronaldo,C. Ronaldo dos Santos Aveiro,Portugal,35,\n\n\n\nJuventus,2018 ~ 2022,"ST, LW",187cm,83kg,Right,...,,€63M,€220K,€75.9M,4 ★,5★,High,Low,5 ★,562
200389,J. Oblak,Jan Oblak,Slovenia,27,\n\n\n\nAtlético Madrid,2014 ~ 2023,GK,188cm,87kg,Right,...,,€120M,€125K,€159.4M,3 ★,1★,Medium,Medium,3 ★,150
192985,K. De Bruyne,Kevin De Bruyne,Belgium,29,\n\n\n\nManchester City,2015 ~ 2023,"CAM, CM",181cm,70kg,Right,...,,€129M,€370K,€161M,5 ★,4★,High,High,4 ★,207
190871,Neymar Jr,Neymar da Silva Santos Jr.,Brazil,28,\n\n\n\nParis Saint-Germain,2017 ~ 2022,"LW, CAM",175cm,68kg,Right,...,,€132M,€270K,€166.5M,5 ★,5★,High,Medium,5 ★,595
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
247223,Xia Ao,Ao Xia,China PR,21,\n\n\n\nWuhan Zall,2018 ~ 2022,CB,178cm,66kg,Right,...,,€100K,€1K,€70K,2 ★,2★,Medium,Medium,1 ★,
258760,B. Hough,Ben Hough,England,17,\n\n\n\nOldham Athletic,2020 ~ 2021,CM,175cm,65kg,Right,...,,€130K,€500,€165K,2 ★,2★,Medium,Medium,1 ★,
252757,R. McKinley,Ronan McKinley,England,18,\n\n\n\nDerry City,2019 ~ 2020,CM,179cm,74kg,Right,...,,€120K,€500,€131K,2 ★,2★,Medium,Medium,1 ★,
243790,Wang Zhen'ao,Zhen'ao Wang,China PR,20,\n\n\n\nDalian YiFang FC,2020 ~ 2022,RW,175cm,69kg,Right,...,,€100K,€2K,€88K,3 ★,2★,Medium,Medium,1 ★,


In [17]:
# Limpieza de caracteres no imprimibles del nombre del Club
fifa_df_cleaned['Club'] = fifa_df_cleaned['Club'].apply(lambda x: re.sub(r'[\n\r\t]+', '', x).strip())
fifa_df_cleaned['Club'].head()

ID
158023           FC Barcelona
20801                Juventus
200389        Atlético Madrid
192985        Manchester City
190871    Paris Saint-Germain
Name: Club, dtype: object

In [18]:
fifa_df_cleaned["Hits"] = pd.to_numeric(fifa_df_cleaned["Hits"], errors="coerce")
fifa_df_cleaned

Unnamed: 0_level_0,Name,LongName,Nationality,Age,Club,Contract,Positions,Height,Weight,Preferred Foot,...,Loan Date End,Value,Wage,Release Clause,W/F,SM,A/W,D/W,IR,Hits
ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
158023,L. Messi,Lionel Messi,Argentina,33,FC Barcelona,2004 ~ 2021,"RW, ST, CF",170cm,72kg,Left,...,,€103.5M,€560K,€138.4M,4 ★,4★,Medium,Low,5 ★,771.0
20801,Cristiano Ronaldo,C. Ronaldo dos Santos Aveiro,Portugal,35,Juventus,2018 ~ 2022,"ST, LW",187cm,83kg,Right,...,,€63M,€220K,€75.9M,4 ★,5★,High,Low,5 ★,562.0
200389,J. Oblak,Jan Oblak,Slovenia,27,Atlético Madrid,2014 ~ 2023,GK,188cm,87kg,Right,...,,€120M,€125K,€159.4M,3 ★,1★,Medium,Medium,3 ★,150.0
192985,K. De Bruyne,Kevin De Bruyne,Belgium,29,Manchester City,2015 ~ 2023,"CAM, CM",181cm,70kg,Right,...,,€129M,€370K,€161M,5 ★,4★,High,High,4 ★,207.0
190871,Neymar Jr,Neymar da Silva Santos Jr.,Brazil,28,Paris Saint-Germain,2017 ~ 2022,"LW, CAM",175cm,68kg,Right,...,,€132M,€270K,€166.5M,5 ★,5★,High,Medium,5 ★,595.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
247223,Xia Ao,Ao Xia,China PR,21,Wuhan Zall,2018 ~ 2022,CB,178cm,66kg,Right,...,,€100K,€1K,€70K,2 ★,2★,Medium,Medium,1 ★,
258760,B. Hough,Ben Hough,England,17,Oldham Athletic,2020 ~ 2021,CM,175cm,65kg,Right,...,,€130K,€500,€165K,2 ★,2★,Medium,Medium,1 ★,
252757,R. McKinley,Ronan McKinley,England,18,Derry City,2019 ~ 2020,CM,179cm,74kg,Right,...,,€120K,€500,€131K,2 ★,2★,Medium,Medium,1 ★,
243790,Wang Zhen'ao,Zhen'ao Wang,China PR,20,Dalian YiFang FC,2020 ~ 2022,RW,175cm,69kg,Right,...,,€100K,€2K,€88K,3 ★,2★,Medium,Medium,1 ★,


In [None]:
df = fifa_df_cleaned
def try_convert_object_cols(df, min_numeric_ratio=0.9):

    df = df.copy()
    converted = {}

    obj_cols = (['Height', 'Weight', 'Joined', 'Value', 'Wage','Release Clause','Hits'])

    for column in obj_cols:
        string_data = df[column].astype(str).str.strip()

        # Negativos entre paréntesis: "(123)" -> "-123"
        string_data = string_data.str.replace(r"^\((.*)\)$", r"-\1", regex=True)

        # Quitar símbolos no numéricos comunes (monedas, % y letras)
        s_clean = string_data.str.replace(r"[^\d\-\., ]", "", regex=True).str.strip()

        s_en = (
            s_clean
            .str.replace(r"\s+", "", regex=True)
            .str.replace(r",(?=\d{3}(\D|$))", "", regex=True)  # quitar comas de miles
        )

        num_en = pd.to_numeric(s_en, errors="coerce")

        candidate = num_en

        # Si todos los valores no nulos son enteros, usar Int64
        if (candidate.dropna() % 1 == 0).all():
            df[c] = candidate.astype("Int64")
            converted[c] = "Int64"
        else:
            df[c] = candidate.astype("float64")
            converted[c] = "float64"

    return df, converted

df_convertido, cols = try_convert_object_cols(df)

print("Columnas convertidas:", cols)

df_convertido


Columnas convertidas: {'Height': 'Int64', 'Weight': 'Int64', 'Joined': 'Int64', 'Value': 'float64', 'Wage': 'Int64', 'Release Clause': 'float64', 'Hits': 'Int64'}


Unnamed: 0_level_0,Name,LongName,Nationality,Age,Club,Contract,Positions,Height,Weight,Preferred Foot,...,Loan Date End,Value,Wage,Release Clause,W/F,SM,A/W,D/W,IR,Hits
ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
158023,L. Messi,Lionel Messi,Argentina,33,FC Barcelona,2004 ~ 2021,"RW, ST, CF",170,72,Left,...,,103.5,560,138.4,4 ★,4★,Medium,Low,5 ★,771
20801,Cristiano Ronaldo,C. Ronaldo dos Santos Aveiro,Portugal,35,Juventus,2018 ~ 2022,"ST, LW",187,83,Right,...,,63.0,220,75.9,4 ★,5★,High,Low,5 ★,562
200389,J. Oblak,Jan Oblak,Slovenia,27,Atlético Madrid,2014 ~ 2023,GK,188,87,Right,...,,120.0,125,159.4,3 ★,1★,Medium,Medium,3 ★,150
192985,K. De Bruyne,Kevin De Bruyne,Belgium,29,Manchester City,2015 ~ 2023,"CAM, CM",181,70,Right,...,,129.0,370,161.0,5 ★,4★,High,High,4 ★,207
190871,Neymar Jr,Neymar da Silva Santos Jr.,Brazil,28,Paris Saint-Germain,2017 ~ 2022,"LW, CAM",175,68,Right,...,,132.0,270,166.5,5 ★,5★,High,Medium,5 ★,595
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
247223,Xia Ao,Ao Xia,China PR,21,Wuhan Zall,2018 ~ 2022,CB,178,66,Right,...,,100.0,1,70.0,2 ★,2★,Medium,Medium,1 ★,
258760,B. Hough,Ben Hough,England,17,Oldham Athletic,2020 ~ 2021,CM,175,65,Right,...,,130.0,500,165.0,2 ★,2★,Medium,Medium,1 ★,
252757,R. McKinley,Ronan McKinley,England,18,Derry City,2019 ~ 2020,CM,179,74,Right,...,,120.0,500,131.0,2 ★,2★,Medium,Medium,1 ★,
243790,Wang Zhen'ao,Zhen'ao Wang,China PR,20,Dalian YiFang FC,2020 ~ 2022,RW,175,69,Right,...,,100.0,2,88.0,3 ★,2★,Medium,Medium,1 ★,


In [20]:
df_convertido

Unnamed: 0_level_0,Name,LongName,Nationality,Age,Club,Contract,Positions,Height,Weight,Preferred Foot,...,Loan Date End,Value,Wage,Release Clause,W/F,SM,A/W,D/W,IR,Hits
ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
158023,L. Messi,Lionel Messi,Argentina,33,FC Barcelona,2004 ~ 2021,"RW, ST, CF",170,72,Left,...,,103.5,560,138.4,4 ★,4★,Medium,Low,5 ★,771
20801,Cristiano Ronaldo,C. Ronaldo dos Santos Aveiro,Portugal,35,Juventus,2018 ~ 2022,"ST, LW",187,83,Right,...,,63.0,220,75.9,4 ★,5★,High,Low,5 ★,562
200389,J. Oblak,Jan Oblak,Slovenia,27,Atlético Madrid,2014 ~ 2023,GK,188,87,Right,...,,120.0,125,159.4,3 ★,1★,Medium,Medium,3 ★,150
192985,K. De Bruyne,Kevin De Bruyne,Belgium,29,Manchester City,2015 ~ 2023,"CAM, CM",181,70,Right,...,,129.0,370,161.0,5 ★,4★,High,High,4 ★,207
190871,Neymar Jr,Neymar da Silva Santos Jr.,Brazil,28,Paris Saint-Germain,2017 ~ 2022,"LW, CAM",175,68,Right,...,,132.0,270,166.5,5 ★,5★,High,Medium,5 ★,595
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
247223,Xia Ao,Ao Xia,China PR,21,Wuhan Zall,2018 ~ 2022,CB,178,66,Right,...,,100.0,1,70.0,2 ★,2★,Medium,Medium,1 ★,
258760,B. Hough,Ben Hough,England,17,Oldham Athletic,2020 ~ 2021,CM,175,65,Right,...,,130.0,500,165.0,2 ★,2★,Medium,Medium,1 ★,
252757,R. McKinley,Ronan McKinley,England,18,Derry City,2019 ~ 2020,CM,179,74,Right,...,,120.0,500,131.0,2 ★,2★,Medium,Medium,1 ★,
243790,Wang Zhen'ao,Zhen'ao Wang,China PR,20,Dalian YiFang FC,2020 ~ 2022,RW,175,69,Right,...,,100.0,2,88.0,3 ★,2★,Medium,Medium,1 ★,
