# NBA Value Data: Basic Cleaning Steps

This notebook demonstrates *basic data cleaning steps* using the NBA Value dataset. We'll cover:
- Loading the data
- Inspecting for missing values, duplicates, and data types
- Making any simple corrections if needed

**No advanced analysis or exclusions are performed in this notebook, only basic cleaning and documentation.**

## 1. Import libraries
We use [pandas](https://pandas.pydata.org/) for data manipulation and [numpy](https://numpy.org/) for basic numerical operations.

In [None]:
import pandas as pd

import numpy as np

## 2. Load the dataset
We load the NBA value data from the provided CSV file. The `.head()` method shows the first few rows to give an overview of the data.

In [None]:
# Update the path if running elsewhere

df = pd.read_csv('cleaned_nba_value_data.csv')

df.head()

## 3. Inspect the data
Let's check the shape, columns, and data types of our dataset to understand its structure.

In [None]:
print('Shape:', df.shape)

print('Columns:', df.columns.tolist())

print('\nData types:')

print(df.dtypes)

## 4. Check for missing values
It's important to identify missing data early on. We use `.isnull().sum()` to see how many missing values are in each column.

In [None]:
df.isnull().sum()

## 5. Check for duplicates
Duplicate rows can distort analysis, so it's a good practice to check for them. We use `.duplicated().sum()` to count duplicates.

In [None]:
print('Number of duplicate rows:', df.duplicated().sum())

## 6. Basic data type check
Correct data types help prevent errors and ensure calculations work as intended. Below, we display the data types again and check if they look appropriate for each column.

In [None]:
df.dtypes

## 7. Example: Data Type Conversion (if needed)
If any columns are not the expected type (for example, if salary is stored as a string with symbols), we would convert them to numeric format. In our case, data types already appear correct, but here's how you could convert a column to float as an example:

In [None]:
# Example only: If 'Salary' was a string, convert to float

# df['Salary'] = df['Salary'].replace('[\$,]', '', regex=True).astype(float)

## 8. Save cleaned file (optional)
If any changes were made, save the cleaned data to a new CSV file.

In [None]:
# If you made changes, you could save with:

# df.to_csv('nba_value_cleaned_basic.csv', index=False)

## Summary for Mentor Discussion
- **Cleaning steps performed:**
    - Inspected shape, columns, and data types
    - Checked for missing values (none found)
    - Checked for duplicate rows (none found)
    - Confirmed all columns have appropriate data types
- **Handling missing values:** None present
- **Outliers:** Not assessed at this stage (covered in later lessons)

No data was excluded or transformed beyond basic inspection in this assignment, as per instructions.