# Nettoyage des Données de Transfert de Joueurs

Ce notebook détaille le processus de nettoyage d'un ensemble de données de transfert de joueurs en utilisant Python et Pandas dans un environnement Jupyter Notebook.

## Importation des bibliothèques

Commencez par importer les bibliothèques nécessaires pour le nettoyage des données

In [136]:
import pandas as pd
import numpy as np
import pymongo
import re
from pymongo import MongoClient


## Chargement des Données
Chargez les données à partir d'un fichier CSV dans un DataFrame Pandas.

In [119]:
data = pd.read_csv('exemple.csv')


## Exploration Initiale
Effectuons une exploration initiale pour mieux comprendre les données.

In [120]:
pd.set_option('display.max_rows', 176)  # Affiche toutes les lignes


In [121]:
# Afficher les premières lignes du DataFrame
data.head()


Unnamed: 0,Player,Age,Nationality,Left Team,Joined Team,Transfer Date,Market Value,Fee
0,Ross Barkley,29,England,OGC Nice,Luton Town,"Aug 9, 2023",6.00m,free transfer
1,Sander Berge,25,Norway,Sheffield United,Burnley FC,"Aug 9, 2023",20.00m,13.95m
2,Álex Gallar,31,Spain,Málaga CF,UD Ibiza,"Aug 9, 2023",700k,free transfer
3,Aimen Mahious,25,Algeria,USM Alger,Yverdon Sport FC,"Aug 9, 2023",750k,free transfer
4,Sergio Arribas,21,Spain,Real Madrid Castilla,UD Almería,"Aug 9, 2023",5.00m,6.00m


In [122]:
# Afficher des informations sur les types de données et les valeurs manquantes
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 176 entries, 0 to 175
Data columns (total 8 columns):
 #   Column         Non-Null Count  Dtype 
---  ------         --------------  ----- 
 0   Player         176 non-null    object
 1   Age            176 non-null    int64 
 2   Nationality    176 non-null    object
 3   Left Team      176 non-null    object
 4   Joined Team    176 non-null    object
 5   Transfer Date  176 non-null    object
 6   Market Value   176 non-null    object
 7   Fee            176 non-null    object
dtypes: int64(1), object(7)
memory usage: 11.1+ KB


In [123]:
# Vérification des valeurs manquantes
print(data.isnull().sum())

Player           0
Age              0
Nationality      0
Left Team        0
Joined Team      0
Transfer Date    0
Market Value     0
Fee              0
dtype: int64


In [124]:
# Remplacer les valeurs "Free Transfer" par 0 dans la colonne 'Fee'
data['Fee'] = data['Fee'].replace('free transfer', 0)
data['Fee'] = data['Fee'].replace('loan transfer', 0)


# Supprimer les lettres 'M' et 'K', puis convertir en valeurs numériques
data['Fee'] = data['Fee'].str.replace('m', '')
data['Fee'] = data['Fee'].str.replace('k', '')



In [125]:
data.head()


Unnamed: 0,Player,Age,Nationality,Left Team,Joined Team,Transfer Date,Market Value,Fee
0,Ross Barkley,29,England,OGC Nice,Luton Town,"Aug 9, 2023",6.00m,
1,Sander Berge,25,Norway,Sheffield United,Burnley FC,"Aug 9, 2023",20.00m,13.95
2,Álex Gallar,31,Spain,Málaga CF,UD Ibiza,"Aug 9, 2023",700k,
3,Aimen Mahious,25,Algeria,USM Alger,Yverdon Sport FC,"Aug 9, 2023",750k,
4,Sergio Arribas,21,Spain,Real Madrid Castilla,UD Almería,"Aug 9, 2023",5.00m,6.0


In [126]:
data = data.fillna(0)


In [127]:
data.head()


Unnamed: 0,Player,Age,Nationality,Left Team,Joined Team,Transfer Date,Market Value,Fee
0,Ross Barkley,29,England,OGC Nice,Luton Town,"Aug 9, 2023",6.00m,0.0
1,Sander Berge,25,Norway,Sheffield United,Burnley FC,"Aug 9, 2023",20.00m,13.95
2,Álex Gallar,31,Spain,Málaga CF,UD Ibiza,"Aug 9, 2023",700k,0.0
3,Aimen Mahious,25,Algeria,USM Alger,Yverdon Sport FC,"Aug 9, 2023",750k,0.0
4,Sergio Arribas,21,Spain,Real Madrid Castilla,UD Almería,"Aug 9, 2023",5.00m,6.0


In [128]:
data = data[(data['Fee'] != '?') & (data['Fee'] != '-')]

# Réindexer le DataFrame après la suppression des lignes
data = data.reset_index(drop=True)

In [130]:
data['Fee'] = data['Fee'].astype(float)
def divide_fee(value):
    if value > 100:
        return value / 1000
    else:
        return value

# Appliquer la fonction à la colonne 'Fee'
data['Fee'] = data['Fee'].apply(divide_fee)



In [132]:
data = data.drop(columns=['Market Value'])

In [134]:
data.head()

Unnamed: 0,Player,Age,Nationality,Left Team,Joined Team,Transfer Date,Fee
0,Ross Barkley,29,England,OGC Nice,Luton Town,"Aug 9, 2023",0.0
1,Sander Berge,25,Norway,Sheffield United,Burnley FC,"Aug 9, 2023",13.95
2,Álex Gallar,31,Spain,Málaga CF,UD Ibiza,"Aug 9, 2023",0.0
3,Aimen Mahious,25,Algeria,USM Alger,Yverdon Sport FC,"Aug 9, 2023",0.0
4,Sergio Arribas,21,Spain,Real Madrid Castilla,UD Almería,"Aug 9, 2023",6.0


In [137]:
# Remplacez 'localhost' par l'adresse IP ou le nom de votre serveur MongoDB
client = MongoClient('localhost', 27017)

# Remplacez 'transfers_db' par le nom de la base de données que vous souhaitez utiliser
db = client['transfers_db']
data_list = data.to_dict('records')
# Remplacez 'transfers_collection' par le nom de la collection que vous souhaitez utiliser
collection = db['players']

# Insérez les données dans la collection
collection.insert_many(data_list)


<pymongo.results.InsertManyResult at 0x2551a0c5fc0>

In [138]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 151 entries, 0 to 150
Data columns (total 7 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   Player         151 non-null    object 
 1   Age            151 non-null    int64  
 2   Nationality    151 non-null    object 
 3   Left Team      151 non-null    object 
 4   Joined Team    151 non-null    object 
 5   Transfer Date  151 non-null    object 
 6   Fee            151 non-null    float64
dtypes: float64(1), int64(1), object(5)
memory usage: 8.4+ KB
