# **Introduction**
>The June edition of the 2022 Tabular Playground series is all about data imputation. The dataset has similarities to the May 2022 Tabular Playground, except that there are no targets. Rather, there are missing data values in the dataset, and your task is to predict what these values should be.

# **Importing Data**

In [None]:
import numpy as np 
import pandas as pd 
import seaborn as sns
import matplotlib.pyplot as plt
import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))



In [None]:
data = pd.read_csv("/kaggle/input/tabular-playground-series-jun-2022/data.csv")
data.head()

# **Data Exploration**

In [None]:
print("dataset shape:\n",data.shape)
print("*********************************************************************************************************************************")
print("dataset stats:\n",data.describe())
print("*********************************************************************************************************************************")
print("dataset info:\n",data.info())
print("*********************************************************************************************************************************")
print("drop duplicates:\n",data.drop_duplicates())

# **Method 3: Iterative Imputer**

In [None]:
from sklearn.experimental import enable_iterative_imputer
from sklearn.impute import IterativeImputer
from xgboost import XGBRegressor

In [None]:
imp = IterativeImputer(
    estimator=XGBRegressor(
    n_estimators=1000, learning_rate=0.05, n_jobs=4,
    ),
    missing_values=np.nan,
    max_iter=10,
    initial_strategy='mean',
    imputation_order='ascending'
)



In [None]:
data[:] = imp.fit_transform(data)

# **Submission**

In [None]:
submission=pd.read_csv("/kaggle/input/tabular-playground-series-jun-2022/sample_submission.csv",index_col='row-col')
submission.head()

In [None]:
submission.shape

In [None]:
from tqdm import tqdm
for i in tqdm(submission.index):
    row = int(i.split('-')[0])
    col = i.split('-')[1]
    submission.loc[i, 'value'] = data.loc[row, col]

In [None]:
submission.to_csv("submission.csv")
submission.head()