### Introduction
To Used pandas we will typically start by importing pandas library

In [5]:
import pandas as pd

#### Creating Data

There are two core objects in pandas: DataFrame and Series

##### DataFrame
A DataFrame is a table. It contains an array of individual entries, each of which has a certain value. Each entry corresponds to a row(or record) and a column.
Ex:


In [6]:
pd.DataFrame({"Apples":[134, 25], "Bananas":[26,51]})

Unnamed: 0,Apples,Bananas
0,134,26
1,25,51


DataFrame entries are not limited to integers. For instance, here's a DataFrame whose values are strings

In [7]:
pd.DataFrame({'Bob': ['I liked it.', 'It was awful.'], 'Sue': ['Pretty good.', 'Bland.']})

Unnamed: 0,Bob,Sue
0,I liked it.,Pretty good.
1,It was awful.,Bland.


We are using the pd.DataFrame() constructor to generate these DataFrame objects. The syntax for declaring a new one is a dictionary whose keys are the column names (Bob and Sue in this example), and whose values are a list of entries. This is the standard way of constructing a new DataFrame.


The dictionary-list constructor assigns values to the column labels, but just uses an ascending count from 0 (0, 1, 2, 3, ...) for the row labels. Sometimes this is OK, but oftentimes we will want to assign these labels ourselves.

The list of row labels used in a DataFrame is known as an Index. We can assign values to it by using an index parameter in our constructor:

In [8]:
pd.DataFrame({'Bob': ['I liked it.', 'It was awful.'], 
              'Sue': ['Pretty good.', 'Bland.']},
             index=['Product A', 'Product B'])

Unnamed: 0,Bob,Sue
Product A,I liked it.,Pretty good.
Product B,It was awful.,Bland.


#### Series
A Series, by contrast, is a sequence of data values. If a DataFrame is a table, a Series is a list. And in fact you can create one with nothing more than a list:

In [9]:
pd.Series([1, 2, 3, 4, 5])

0    1
1    2
2    3
3    4
4    5
dtype: int64

In [1]:
import pandas as pd
pd.Series([1, 2, 3, 4, 5], index = ["Item 1", "Item 2", "Item 3", "Item 4", "Item 5"])

0    1
1    2
2    3
3    4
4    5
dtype: int64

A Series is, in essence, a single column of a DataFrame. So you can assign row labels to the Series the same way as before, using an index parameter. However, a Series does not have a column name, it only has one overall name

In [12]:
pd.Series([30, 35, 40], index=['2015 Sales', '2016 Sales', '2017 Sales'], name='Product A')

2015 Sales    30
2016 Sales    35
2017 Sales    40
Name: Product A, dtype: int64

The Series and the DataFrame are intimately related. It's helpful to think of a DataFrame as actually being just a bunch of Series "glued together". 

### Reading data files
Being able to create a DataFrame or Series by hand is handy. But, most of the time, we won't actually be creating our own data by hand. Instead, we'll be working with data that already exists.

Data can be stored in any of a number of different forms and formats. By far the most basic of these is the humble CSV file. When you open a CSV file you get something that looks like this:
```csv
Product A,Product B,Product C,
30,21,9,
35,34,1,
41,11,11
```
So a CSV file is a table of values separated by commas. Hence the name: "Comma-Separated Values", or CSV.

Let's now set aside our toy datasets and see what a real dataset looks like when we read it into a DataFrame. We'll use the pd.read_csv() function to read the data into a DataFrame. This goes thusly:


In [15]:
s = """Product A,Product B,Product C
30,21,9
35,34,1
41,11,11"""
from io import StringIO
pd.read_csv(StringIO(s)).to_csv("sample.csv")

In [18]:
df = pd.read_csv("sample.csv")
df

Unnamed: 0.1,Unnamed: 0,Product A,Product B,Product C
0,0,30,21,9
1,1,35,34,1
2,2,41,11,11


In [19]:
df = pd.read_csv("sample.csv", index_col=0)
df

Unnamed: 0,Product A,Product B,Product C
0,30,21,9
1,35,34,1
2,41,11,11


In [20]:
df.head(1)

Unnamed: 0,Product A,Product B,Product C
0,30,21,9


In [21]:
df.tail(1)

Unnamed: 0,Product A,Product B,Product C
2,41,11,11


In [22]:
df.shape

(3, 3)

## Different ways to create DataFrame


In [23]:
import pandas as pd
data = [10,20,30,40,50,60]
df = pd.DataFrame(data, columns=['Numbers'])
df

Unnamed: 0,Numbers
0,10
1,20
2,30
3,40
4,50
5,60


In [24]:
data = [['tom', 10], ['nick', 15], ['juli', 14]]
df = pd.DataFrame(data, columns=['Name', 'Age'])
df

Unnamed: 0,Name,Age
0,tom,10
1,nick,15
2,juli,14


In [25]:
df = pd.DataFrame({'Name': ['Tom', 'nick', 'krish', 'jack'], 'Age': [20, 21, 19, 18]})
df

Unnamed: 0,Name,Age
0,Tom,20
1,nick,21
2,krish,19
3,jack,18


In [19]:
data = {'a': 1, 'b': 2, 'c': 3}
pd.Series(data).to_frame().T

Unnamed: 0,a,b,c
0,1,2,3


In [27]:
import pandas as pd
sr = pd.Series([10, 20, 30, 40])
df = pd.DataFrame(sr)
df

Unnamed: 0,0
0,10
1,20
2,30
3,40


In [1]:
import pandas as pd
d = {'one': pd.Series([10, 20, 30, 40],
                      index=['a', 'b', 'c', 'd']),
     'two': pd.Series([10, 20, 30, 40],
                      index=['a', 'b', 'c', 'd'])}
  
df = pd.DataFrame(d)
df

Unnamed: 0,one,two
a,10,10
b,20,20
c,30,30
d,40,40


In [15]:
print(pd.DataFrame({}))

Empty DataFrame
Columns: []
Index: []


In [4]:
pd.read_excel('sample.xlsx')

ValueError: Excel file format cannot be determined, you must specify an engine manually.

In [5]:
from io import StringIO
import pandas as pd1
data = """
Column A, Column B, Column C
Value1, 123, 23-09-2023
Value2, 147, 22-09-2023
Value3, 364, 21-09-2023
"""
pd.read_csv(StringIO(data))

Unnamed: 0,Column A,Column B,Column C
0,Value1,123,23-09-2023
1,Value2,147,22-09-2023
2,Value3,364,21-09-2023
