# Example
## Introduction

**Datalizer** is a Python package designed to simplify early-stage data analysis.

Its goal is to help users:
- Load and validate structured datasets
- Detect and resolve common data issues (e.g. missing values, duplicates)
- Prepare numerical data for modeling

## Imports

In [1]:
import datalizer as dl

## Loading in Data

The first step in any data project is to load your dataset.  
`Datalizer` provides a convenient function called `load_data()` that reads `.csv`, `.xlsx`, or `.json` files and ensures that the dataset is entirely numerical.

If non-numeric data is detected, `load_data()` will raise an error — helping you catch issues early in the pipeline.

In [2]:
# Specify a file path
file_path = "sample_numerical.csv"

# Load a sample dataset
df = dl.load_data(file_path=file_path)

# Show the first few rows
df.head()

Unnamed: 0,age,weight,height
0,25,68,175.0
1,30,75,180.0
2,22,60,165.0
3,35,80,185.0
4,35,80,185.0


## Checking for Issues

Before cleaning a dataset, it's good practice to identify any existing problems.  
The `check_for_issues()` function quickly inspects your dataset and reports:

- Number of missing values
- Number of duplicate rows
- Displays the problematic rows, if there are any

This step helps you decide what kind of cleaning strategy to apply.

In [3]:
dl.check_for_issues(df)


Number of missing cells: 1

Rows with missing values:
   age  weight  height
5   28      70     NaN

Number of duplicate rows: 1

Duplicate rows:
   age  weight  height
4   35      80   185.0


## Cleaning the Dataset

Once issues are detected, you can clean the dataset using `clean_basic()`.

This function:
- Removes duplicate rows
- Handles missing values based on a selected strategy:
  - `"mean"` – fill missing values with column means
  - `"median"` – fill with column medians
  - `"mode"` – fill with most frequent values
  - `"drop"` – remove rows with missing values entirely

By default, `clean_basic()` returns a new cleaned DataFrame without modifying the original.

In [4]:
df = dl.clean_basic(df, strategy="drop")

print("\nData after cleaning:")
df.head()


Missing values detected. Cleaning with strategy: 'drop'.

Data after cleaning:


Unnamed: 0,age,weight,height
0,25,68,175.0
1,30,75,180.0
2,22,60,165.0
3,35,80,185.0
