# DataFrame

## What is DataFrame?
A Pandas DataFrame is a 2-dimensional, tabular data structure in Python, similar to an Excel spreadsheet or a SQL table. It allows you to store and manipulate data in rows and columns format, making data analysis easy and efficient.

### Key Features of a DataFrame:
Labeled Axes:
Rows are labeled with an index.
Columns have names (headers).

Heterogeneous Data:
Each column can hold different data types: integers, floats, strings, booleans, etc.

Size Mutable:
You can add, delete, or modify rows and columns easily.

Integrated with NumPy:
Built on top of NumPy, which makes mathematical operations fast.

In [15]:
# Imorting the pandas library
import pandas as pd

In [2]:
# Creating a sample dataset
data={"name":["Bill","Tom","Tim","John","Alex","Vanessa","Kate"],      
      "score":[90,80,85,75,95,60,65],      
      "sport":["Wrestling","Football","Skiing","Swimming","Tennis",
               "Karete","Surfing"],      
      "gender":["M","M","M","M","F","F","F"]}

In [3]:
# Creating a dataframe from the sample dataset
df=pd.DataFrame(data)

In [4]:
df

Unnamed: 0,name,score,sport,gender
0,Bill,90,Wrestling,M
1,Tom,80,Football,M
2,Tim,85,Skiing,M
3,John,75,Swimming,M
4,Alex,95,Tennis,F
5,Vanessa,60,Karete,F
6,Kate,65,Surfing,F


In [6]:
 # Speciyfing the column order as name, sport, gender, score. Note the difference with the previous output above
df=pd.DataFrame(data,columns=["name","sport","gender","score"])  
df

Unnamed: 0,name,sport,gender,score
0,Bill,Wrestling,M,90
1,Tom,Football,M,80
2,Tim,Skiing,M,85
3,John,Swimming,M,75
4,Alex,Tennis,F,95
5,Vanessa,Karete,F,60
6,Kate,Surfing,F,65


In [47]:
df.head()   # Displaying first few records

Unnamed: 0,name,sport,gender,score
0,Bill,Wrestling,M,90
1,Tom,Football,M,80
2,Tim,Skiing,M,85
3,John,Swimming,M,75
4,Alex,Tennis,F,95


In [7]:
df.tail()    # Displaying last few records

Unnamed: 0,name,sport,gender,score
2,Tim,Skiing,M,85
3,John,Swimming,M,75
4,Alex,Tennis,F,95
5,Vanessa,Karete,F,60
6,Kate,Surfing,F,65


In [8]:
df.tail(3)  # Displaying last three records

Unnamed: 0,name,sport,gender,score
4,Alex,Tennis,F,95
5,Vanessa,Karete,F,60
6,Kate,Surfing,F,65


In [9]:
df.head(2)   # Displaying first two records

Unnamed: 0,name,sport,gender,score
0,Bill,Wrestling,M,90
1,Tom,Football,M,80


In [11]:
# Adding another column 'age'. Newly added column will have null values
df=pd.DataFrame(data,columns=["name", "sport", "gender", "score", "age"])   
df

Unnamed: 0,name,sport,gender,score,age
0,Bill,Wrestling,M,90,
1,Tom,Football,M,80,
2,Tim,Skiing,M,85,
3,John,Swimming,M,75,
4,Alex,Tennis,F,95,
5,Vanessa,Karete,F,60,
6,Kate,Surfing,F,65,


In [12]:
df=pd.DataFrame(data,columns=["name", "sport", "gender", "score", "age"],
                index=["one","two","three","four","five","six","seven"]) # Renaming index 
df

Unnamed: 0,name,sport,gender,score,age
one,Bill,Wrestling,M,90,
two,Tom,Football,M,80,
three,Tim,Skiing,M,85,
four,John,Swimming,M,75,
five,Alex,Tennis,F,95,
six,Vanessa,Karete,F,60,
seven,Kate,Surfing,F,65,


In [53]:
df["sport"]   # Accessing data from column "sport"

one      Wrestling
two       Football
three       Skiing
four      Swimming
five        Tennis
six         Karete
seven      Surfing
Name: sport, dtype: object

In [13]:
my_columns=["name","sport"]   # Accessing data from columns "name" and "sport"
df[my_columns]

Unnamed: 0,name,sport
one,Bill,Wrestling
two,Tom,Football
three,Tim,Skiing
four,John,Swimming
five,Alex,Tennis
six,Vanessa,Karete
seven,Kate,Surfing


In [55]:
df.sport   # Accessing data from column "sport"

one      Wrestling
two       Football
three       Skiing
four      Swimming
five        Tennis
six         Karete
seven      Surfing
Name: sport, dtype: object

In [56]:
df.loc[["one"]]    # Accessing data from row "one"

Unnamed: 0,name,sport,gender,score,age
one,Bill,Wrestling,M,90,


In [57]:
df.loc[["one","two"]]   # Accessing data from row "one" and "two"

Unnamed: 0,name,sport,gender,score,age
one,Bill,Wrestling,M,90,
two,Tom,Football,M,80,


In [58]:
df["age"]=18    # Assigns 18 to all locations under column "age"
df.head()

Unnamed: 0,name,sport,gender,score,age
one,Bill,Wrestling,M,90,18
two,Tom,Football,M,80,18
three,Tim,Skiing,M,85,18
four,John,Swimming,M,75,18
five,Alex,Tennis,F,95,18


In [59]:
df=pd.DataFrame(data,columns=["name", "sport", "gender", "score", "age"], 
                index=["one","two","three","four","five","six","seven"])
values=[18,19,20,18,17,17,18]
df["age"]=values     # Assigns values from the list "values" to the column "age"
df

Unnamed: 0,name,sport,gender,score,age
one,Bill,Wrestling,M,90,18
two,Tom,Football,M,80,19
three,Tim,Skiing,M,85,20
four,John,Swimming,M,75,18
five,Alex,Tennis,F,95,17
six,Vanessa,Karete,F,60,17
seven,Kate,Surfing,F,65,18


In [60]:
df["pass"]=df.score>=70    #Creates a new column "pass" and assigns "True" if score is greater than equal to 70, else "False"
df

Unnamed: 0,name,sport,gender,score,age,pass
one,Bill,Wrestling,M,90,18,True
two,Tom,Football,M,80,19,True
three,Tim,Skiing,M,85,20,True
four,John,Swimming,M,75,18,True
five,Alex,Tennis,F,95,17,True
six,Vanessa,Karete,F,60,17,False
seven,Kate,Surfing,F,65,18,False


In [61]:
del df["pass"]   # Deletes the column "pass"
df

Unnamed: 0,name,sport,gender,score,age
one,Bill,Wrestling,M,90,18
two,Tom,Football,M,80,19
three,Tim,Skiing,M,85,20
four,John,Swimming,M,75,18
five,Alex,Tennis,F,95,17
six,Vanessa,Karete,F,60,17
seven,Kate,Surfing,F,65,18


In [62]:
scores={"Math":{"A":85,"B":90,"C":95}, "Physics":{"A":90,"B":80,"C":75}}   # A dictionary

In [63]:
scores_df=pd.DataFrame(scores)  # Creating dataframe from the dictionary "scores"
scores_df

Unnamed: 0,Math,Physics
A,85,90
B,90,80
C,95,75


In [64]:
scores_df.T   # Tranpose of the dataframe

Unnamed: 0,A,B,C
Math,85,90,95
Physics,90,80,75


In [14]:
scores_df.index.name="name"      # Labeling the index
scores_df.columns.name="subject"   # Labeling the column names

NameError: name 'scores_df' is not defined

In [66]:
scores_df

subject,Math,Physics
name,Unnamed: 1_level_1,Unnamed: 2_level_1
A,85,90
B,90,80
C,95,75


In [67]:
scores_df.values   # Interpreting dataframe as NumPy array

array([[85, 90],
       [90, 80],
       [95, 75]], dtype=int64)

In [None]:
scores_index=scores_df.index

In [None]:
#scores_index[1]="Jack"
scores_index

Don't forget to follow us on [YouTube](http://youtube.com/tirendazacademy) | [Medium](http://tirendazacademy.medium.com) | [Twitter](http://twitter.com/tirendazacademy) | [GitHub](http://github.com/tirendazacademy) | [Linkedin](https://www.linkedin.com/in/tirendaz-academy) | [Kaggle](https://www.kaggle.com/tirendazacademy) 😎