## Basic Data Exploration with Pandas

#### Intial Prompt:
"Teach me about Pandas"

##### Response:
Sure! Pandas is a powerful and flexible data analysis library for Python. It's widely used for data manipulation and analysis, providing data structures and functions needed to work with structured data seamlessly. Let's go through the basics step-by-step.

### 1. Installing Pandas
First, you need to install Pandas if you haven't already. You can do this using pip:
```bash
pip install pandas
```

### 2. Importing Pandas
To use Pandas in your Python code, you need to import it:
```python
import pandas as pd
```

### 3. Creating DataFrames
A DataFrame is a two-dimensional, size-mutable, and potentially heterogeneous tabular data structure with labeled axes (rows and columns). You can create a DataFrame from various data sources like lists, dictionaries, or CSV files.

**Example: Creating a DataFrame from a dictionary**
```python
data = {
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Age': [25, 30, 35],
    'City': ['New York', 'Los Angeles', 'Chicago']
}
df = pd.DataFrame(data)
print(df)
```

### 4. Reading Data from Files
Pandas can read data from various file formats, such as CSV, Excel, and SQL databases.

**Example: Reading a CSV file**
```python
df = pd.read_csv('path/to/your/file.csv')
print(df.head())  # Display the first few rows
```

### 5. Exploring Data
You can explore and summarize your data using various Pandas functions.

**Example: Displaying basic information**
```python
print(df.info())  # Display information about the DataFrame
print(df.describe())  # Display summary statistics
print(df.head())  # Display the first few rows
print(df.tail())  # Display the last few rows
```

### 6. Data Manipulation
Pandas provides powerful tools for data manipulation, such as filtering, sorting, and grouping.

**Example: Filtering data**
```python
# Filter rows where Age is greater than 30
filtered_df = df[df['Age'] > 30]
print(filtered_df)
```

**Example: Sorting data**
```python
# Sort by Age in descending order
sorted_df = df.sort_values(by='Age', ascending=False)
print(sorted_df)
```

**Example: Grouping data**
```python
# Group by City and calculate the mean Age
grouped_df = df.groupby('City')['Age'].mean()
print(grouped_df)
```

### 7. Handling Missing Data
Pandas provides functions to handle missing data, such as filling or dropping missing values.

**Example: Filling missing values**
```python
df['Age'].fillna(df['Age'].mean(), inplace=True)  # Fill missing Age values with the mean
print(df)
```

**Example: Dropping missing values**
```python
df.dropna(inplace=True)  # Drop rows with any missing values
print(df)
```

### 8. Merging and Joining DataFrames
You can merge or join multiple DataFrames using various functions.

**Example: Merging DataFrames**
```python
df1 = pd.DataFrame({'A': [1, 2], 'B': [3, 4]})
df2 = pd.DataFrame({'A': [1, 2], 'C': [5, 6]})
merged_df = pd.merge(df1, df2, on='A')
print(merged_df)
```

### 9. Saving Data to Files
You can save your DataFrame to various file formats.

**Example: Saving to a CSV file**
```python
df.to_csv('path/to/save/file.csv', index=False)
```

### 10. Plotting Data
Pandas integrates well with plotting libraries like Matplotlib for data visualization.

**Example: Plotting data**
```python
import matplotlib.pyplot as plt

df['Age'].plot(kind='hist')
plt.show()
```

These are the basics of Pandas. As you get more comfortable, you can explore more advanced features and functionalities. If you have any questions or need further clarification, feel free to ask!

### Learning Outcomes for Pandas

1. Installing and Importing Pandas
    - I understand how to install pandas using pip Install

2. Creating DataFrames
    - I can create DataFrames using data sources like lists, dictionaries or CSV files.

3. Reading Data from Files
    - Pandas reads the data in the files and displayes them using the DataFrame

4. Exploring Data
    - I can summarize data using the info(),  describe(), head(), and tail() functions

5. Data Manipulation
    - Data can be filtered based on different conditions.
    - The data can be sorted into specific columns, and can be grouped to perform different functions.

## When Scikit-Learn Datasets Meet Pandas
The dataset I have chosen for this task is the load_wine. I chose this because it will allow me to pracice classification techniques. 

This dataset contains samples of wines with various chemical properties such as alcohol content, malic acid, ash, and colour intensity.

In [1]:
import pandas as pd
from sklearn.datasets import load_wine

wine = load_wine()

df = pd.DataFrame(data=wine.data,
columns=wine.feature_names)

df['target'] = wine.target

print(df.head())

   alcohol  malic_acid   ash  alcalinity_of_ash  magnesium  total_phenols  \
0    14.23        1.71  2.43               15.6      127.0           2.80   
1    13.20        1.78  2.14               11.2      100.0           2.65   
2    13.16        2.36  2.67               18.6      101.0           2.80   
3    14.37        1.95  2.50               16.8      113.0           3.85   
4    13.24        2.59  2.87               21.0      118.0           2.80   

   flavanoids  nonflavanoid_phenols  proanthocyanins  color_intensity   hue  \
0        3.06                  0.28             2.29             5.64  1.04   
1        2.76                  0.26             1.28             4.38  1.05   
2        3.24                  0.30             2.81             5.68  1.03   
3        3.49                  0.24             2.18             7.80  0.86   
4        2.69                  0.39             1.82             4.32  1.04   

   od280/od315_of_diluted_wines  proline  target  
0          