# 2. Exploratory data analysis - In-depth profiling

The first step in a data preparation pipeline is the exploratory data analysis (EDA). In a nutshell, data exploration and data cleansing are hand-to-hand and both are mutually iterative steps.

*But what does data exploration includes? And how to make a better data exploration giving we are building a credit scorecard model?*


Data exploration includes both univariate and bivariate analysis and ranges from univariate statistics and frequency distributions to correlations, cross-tabulation, and characteristic analysis.
add here detail about pandas-profiling and data exploration in general (re-use the sentence above)

## Read the data & computed metadata

### Import needed packages

In [None]:
import os
from pickle import load

import pandas as pd
from ydata.labs.datasources import DataSources
from ydata.metadata import Metadata
from ydata.profiling import ProfileReport
from ydata.utils.formats import read_json

In [None]:
dataset = DataSources.get(uid="973d95c7-e6bd-4535-a0ea-d3dd1e893b13").read()

In [None]:
meta = Metadata.load("metadata.pkl")
print(meta)

## Generating the full data profile

In [None]:
data_path = os.environ.get("DATASET_PATH", "")
data_name = os.environ.get("DATASET_NAME", "")

In [None]:
print(f"Profile Name: {data_name}_profile")
profile = ProfileReport(df=data, title="Data profiling")
profile.config.html.navbar_show = False

profile.to_file(f"{data_name}_profile.html")

In [None]:
import json

metadata = {
    "outputs": [
        {
            "type": "table",
            "storage": "inline",
            "format": "csv",
            "header": list(ratio_labels.columns),
            "source": ratio_labels.to_csv(header=False, index=True),
        },
        {
            "type": "web-app",
            "storage": "inline",
            "source": profile.to_html(),
        },
    ]
}

with open("mlpipeline-ui-metadata.json", "w") as metadata_file:
    json.dump(metadata, metadata_file)