# 🐼 Pandas Tutorial: Feature Extraction

This notebook demonstrates how to extract meaningful features from the anime dataset to perform insightful analysis.


### Loading the Data

We start by importing necessary libraries and loading the dataset:


In [None]:
import numpy as np
import pandas as pd
df = pd.read_csv(r'anime.csv')
df.head()

### Extracting Episode Counts

We extract the number of episodes from the "Title" column using a custom function:


In [None]:
def extract_episodes(txt):
    check = False
    data = ""
    for i in txt:
        if i == ")":
            check = False
            return data
        if check == True:
            data = data + i
        if i == '(': 
            check = True

df["Episodes"] = df["Title"].apply(extract_episodes)
df['Episodes'] = df['Episodes'].str.replace(" eps", "")
df['Episodes'] = df['Episodes'].astype(int)
df

### Extracting Total Time

Similarly, we extract the total time period from the "Title" column:


In [None]:
def extraction_time(txt):
    check = False
    data = ""
    for i in range(len(txt)):
        if txt[i] == ')':
            for j in range(i+1, i+20):
                data += txt[j]
            return data

df['Total Time'] = df['Title'].apply(extraction_time)
df.head()

### Calculating Duration in Months

Using the extracted time period, we calculate the total duration in months:


In [None]:
from dateutil.relativedelta import relativedelta
from datetime import datetime

def calculate_total_months(period):
    try:
        start_str, end_str = period.split(' - ')
        start_date = datetime.strptime(start_str, '%b %Y')
        end_date = datetime.strptime(end_str, '%b %Y')
        r = relativedelta(end_date, start_date)
        return r.years * 12 + r.months + 1  # +1 to include the starting month
    except:
        return None

df['Months'] = df['Total Time'].apply(calculate_total_months)
df

### Analysis Examples

- Anime with the highest score:
```python
df[df['Score'] == df['Score'].max()]['Title']
```
- Top 5 highest scoring anime:
```python
df['Title'].head()
```
- Anime with the highest episode count:
```python
df[df['Episodes'] == df['Episodes'].max()]
```
- Animes with top 5 episode counts:
```python
df.nlargest(5, 'Episodes')[['Title', 'Episodes']]
```
- Longest running anime (top 5 by months):
```python
df.nlargest(5, 'Months')[['Title', 'Months']]
```


---
## 🚀 Next Steps
Create one more project.
* Learn this file **9_Countries.ipynb**