# Ydata-profilling Tutorial
The Flash team is excited to share with you a small tutorial on Ydata-profilling.
Before jumping into this tutorial, we recommend giving a look to this [README](README.md) in order to get more familiar with Ydata-profilling and its pros/cons ! 

Now that’s being said, let’s dig into a small example where we will explore the Census Income dataset.

## Import libraries

In [None]:
import pandas as pd
from ucimlrepo import fetch_ucirepo 
from ydata_profiling import ProfileReport

## Import data

In this example we will use the Census Income dataset. 

A classification dataset used to predict whether someone's income is above 50k$ based on demographic features.

In [None]:
income_census_raw = fetch_ucirepo(id=2) 
income_census = pd.DataFrame(income_census_raw.data.features, columns=income_census_raw.feature_names)
income_census['income'] = income_census_raw.data.targets

In [None]:
profile = ProfileReport(
    income_census, 
    title="Adult income census Dataset", 
    html={"style": {"full_width": True}}
)

In [None]:
profile.to_file("reports/income_census_dataset_report.html")

## Dataset comparison

It is also possible to compare two (or more) datasets using the `compare` function.
This can be useful when comparing two subsets or two versions of a dataset.

In [None]:
below_50k_report = ProfileReport(
    income_census[income_census["income"] == '>50K'],
    title="Below 50k Report",
)

above_50k_report = ProfileReport(
    income_census[income_census["income"] == '<=50K'],
    title="Above 50k Report",
)

comparison_report = below_50k_report.compare(above_50k_report)
comparison_report.config.html.style.primary_colors = ["#FCC445", "#57ACD9"]
comparison_report.to_file("reports/income_census_dataset_comparison.html")
