> # **DataFrame**

A `DataFrame` in `pandas` is like a table or spreadsheet where data is organized in rows and columns. Think of it as a way to store and manipulate tabular data in Python. Each column in a DataFrame can hold different types of data, such as numbers, text, dates, or even complex objects.

### Key Points of a DataFrame
- `Rows:` Represent individual records (like a person, product, or event).

- `Columns:` Represent different types of information (such as name, age, price, or date).

- `Index:` Each row has an index that helps locate it quickly.

### `Example` of a DataFrame

Suppose you have data about movies, with each row representing a movie and each column representing details about that movie (like title, genre, rating, and release year).

|Title | Genra | Rating | Year |
| --- | --- | --- | --- |
| The Shawshank Redemption| 	Drama| 	9.3| 	1994| 
| Inception	| Sci-Fi| 	8.8| 	2010| 
| Titanic| 	Romance| 	7.8	| 1997| 

---

In pandas, you would store this data as a DataFrame:

```python
import pandas as pd

# Creating a DataFrame
movies_data = {
    'Title': ['The Shawshank Redemption', 'Inception', 'Titanic'],
    'Genre': ['Drama', 'Sci-Fi', 'Romance'],
    'Rating': [9.3, 8.8, 7.8],
    'Year': [1994, 2010, 1997]
}

df = pd.DataFrame(movies_data)
print(df)
```

---

### When to Use a DataFrame

You should use a DataFrame whenever:


- `You have tabular data:` Data organized in rows and columns, like you would see in an Excel sheet.

- `Data needs different types:` Each column can be a different data type (e.g., numbers, strings).

- `Data needs manipulation:` Pandas provides powerful functions to filter, sort, group, aggregate, merge, and transform the data.

### Why Use a DataFrame in pandas?

- `Data Manipulation:` Pandas offers a variety of methods to clean, reshape, filter, and analyze data.

- `Data Analysis:` You can quickly calculate summaries, perform group operations, and handle missing data.

- `Data Transformation:` DataFrames support complex operations, such as merging, joining, and stacking tables.

DataFrames make it easy to organize, explore, and process data, making them essential for any data analysis or data science tasks in Python.

In [1]:
import pandas as pd
import numpy as np

## **Creating DataFrame**

In [2]:
# using lists
student_data = [
    [100,80,10],
    [90,70,7],
    [120,100,14],
    [80,50,]
]

column_name = ['iq','marks','package']
df = pd.DataFrame(student_data, columns=column_name, index=['Ahmad', 'Hassan', 'Muhammad', 'Hamza'])

In [3]:
df

Unnamed: 0,iq,marks,package
Ahmad,100,80,10.0
Hassan,90,70,7.0
Muhammad,120,100,14.0
Hamza,80,50,


In [4]:
# using dicts

student_dict = {
    'name':['hasnain','ahmad','musa','ali','imran','essa'],
    'iq':[100,90,120,80,0,0],
    'marks':[80,70,100,50,0,0],
    'package':[10,7,14,2,0,0]
}

students = pd.DataFrame(student_dict, index=['hasnain','ahmad','musa','ali','imran','essa'])
students

Unnamed: 0,name,iq,marks,package
hasnain,hasnain,100,80,10
ahmad,ahmad,90,70,7
musa,musa,120,100,14
ali,ali,80,50,2
imran,imran,0,0,0
essa,essa,0,0,0


In [5]:
# using read_csv
movies = pd.read_csv('./bollywood.csv')
movies

Unnamed: 0,movie,lead
0,Uri: The Surgical Strike,Vicky Kaushal
1,Battalion 609,Vicky Ahuja
2,The Accidental Prime Minister (film),Anupam Kher
3,Why Cheat India,Emraan Hashmi
4,Evening Shadows,Mona Ambegaonkar
...,...,...
1495,Hum Tumhare Hain Sanam,Shah Rukh Khan
1496,Aankhen (2002 film),Amitabh Bachchan
1497,Saathiya (film),Vivek Oberoi
1498,Company (film),Ajay Devgn


In [6]:
# shape

In [7]:
# dtypes

In [8]:
# index

In [9]:
# columns

In [10]:
# head and tail

In [11]:
# sample

In [12]:
# info
movies.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1500 entries, 0 to 1499
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   movie   1500 non-null   object
 1   lead    1500 non-null   object
dtypes: object(2)
memory usage: 23.6+ KB


In [13]:
# describe

In [14]:
# isnull

In [15]:
# duplicated

In [16]:
# rename

In [17]:
# sum -> axis argument
df

Unnamed: 0,iq,marks,package
Ahmad,100,80,10.0
Hassan,90,70,7.0
Muhammad,120,100,14.0
Hamza,80,50,


In [18]:
sum(df['iq'])

390