# Creating and Viewing DataFrames and Series

Pandas is a popular open-source Python library widely used for data manipulation and analysis. It provides powerful data structures and functions that make working with structured data, such as CSV files, Excel sheets, SQL tables, and more, efficient and straightforward. The name *Pandas* is derived from "Panel Data," which is a term used in statistics and econometrics.

**DataFrame:** The core data structure in Pandas is the DataFrame, a two-dimensional labeled data structure with columns of potentially different data types. It resembles a spreadsheet or SQL table and allows data to be organized and analyzed effectively.

**Series:** Another essential data structure in Pandas is the Series, which is a one-dimensional labeled array. Series can hold data of any type and are used to represent a single column or row of data in a DataFrame.

In [1]:
import pandas as pd

## Creating and Viewing DataFrames and Series

DataFrame

**from external file**

In [3]:
df_ext = pd.read_csv("data/nba.csv") # Load the dataset

print(type(df_ext)) # Check the type of the object

<class 'pandas.core.frame.DataFrame'>
(458, 9)


**From python dictonary**

Pairs: Data in a dictionary is always stored as a pair where a unique key is associated with a specific value.

Access: You access the value by referencing its unique key, not by a numerical index (like in a list).

Analogy: Think of a physical dictionary where the word is the key and its definition is the value, or a phone book where the name is the key and the phone number is the value.

| Characteristic | Description |
| :--- | :--- |
| **Mutable** | You can change, add, or remove items (key-value pairs) after the dictionary has been created. |
| **Unordered (Since Python 3.7)** | Dictionaries maintain the **insertion order** of the keys. Before Python 3.7, they were completely unordered. |
| **Indexed by Keys** | You access items using the key (`my_dict['key']`), not an integer index. |
| **Keys Must Be Unique** | Each key in a dictionary must be unique. If you try to add a duplicate key, the new value will overwrite the old one. |
| **Keys Must Be Immutable** | Keys must be of an immutable type (like strings, numbers, or tuples). Values can be any data type (lists, other dictionaries, etc.). |



In [4]:
d = {"Name":["Ajit", "Sujit"], "Age":[23, 22]} # Create a dictionary

df_dict = pd.DataFrame(d) # Create DataFrame from dictionary

df_dict.head(1) # View the first row of the DataFrame

Unnamed: 0,Name,Age
0,Ajit,23


In [5]:
df_ext.head(3) # View the first 3 rows of the NBA DataFrame

Unnamed: 0,Name,Team,Number,Position,Age,Height,Weight,College,Salary
0,Avery Bradley,Boston Celtics,0.0,PG,25.0,6-2,180.0,Texas,7730337.0
1,Jae Crowder,Boston Celtics,99.0,SF,25.0,6-6,235.0,Marquette,6796117.0
2,John Holland,Boston Celtics,30.0,SG,27.0,6-5,205.0,Boston University,


In [6]:
df_ext.shape # Get the shape of the NBA DataFrame

(458, 9)

In [7]:
df_ext.tail() # View the last 5 rows of the NBA DataFrame

Unnamed: 0,Name,Team,Number,Position,Age,Height,Weight,College,Salary
453,Shelvin Mack,Utah Jazz,8.0,PG,26.0,6-3,203.0,Butler,2433333.0
454,Raul Neto,Utah Jazz,25.0,PG,24.0,6-1,179.0,,900000.0
455,Tibor Pleiss,Utah Jazz,21.0,C,26.0,7-3,256.0,,2900000.0
456,Jeff Withey,Utah Jazz,24.0,C,26.0,7-0,231.0,Kansas,947276.0
457,,,,,,,,,


### Series

**From python list**

In [8]:
lst = [1, 2 , 3] # Create a list

lst_series = pd.Series(lst) # Convert list to Pandas Series

type(lst_series) # Check the type of the object

pandas.core.series.Series

Next Chapter [Data Aggregation and Statistics](2.DataAggregationStatistics.ipynb)