In [1]:
import pandas  as pd 

There are two core objects in pandas: the DataFrame and the Series.


# DataFrame<br>
A DataFrame is a table. It contains an array of individual entries, each of which has a certain value. Each entry corresponds to a row (or record) and a column.

In [2]:
pd.DataFrame({'Yes': [50, 21], 'No': [131, 2]})

Unnamed: 0,Yes,No
0,50,131
1,21,2


DataFrame entries are not limited to integers. For instance, here's a DataFrame whose values are strings:

In [4]:
pd.DataFrame({'Bob': ['I liked it.', 'It was awful.'], 'Sue': ['Pretty good.', 'Bland.']})


Unnamed: 0,Bob,Sue
0,I liked it.,Pretty good.
1,It was awful.,Bland.


The dictionary-list constructer assign value to column label , but just uses count from 0 for the row label.<br>
To change the row labels we use **index**

In [5]:
pd.DataFrame({'Bob': ['I liked it.', 'It was awful.'], 'Sue': ['Pretty good.', 'Bland.']},index=['Product A', 'Product B'])


Unnamed: 0,Bob,Sue
Product A,I liked it.,Pretty good.
Product B,It was awful.,Bland.


**Series**<br>
A Series, by contrast, is a sequence of data values. If a DataFrame is a table, a Series is a list. And in fact you can create one with nothing more than a list:

In [7]:
pd.Series([1, 2, 3, 4, 5])

0    1
1    2
2    3
3    4
4    5
dtype: int64

# Series
A Series is, in essence, a single column of a DataFrame. So you can assign column values to the Series the same way as before, using an index parameter. However, a Series does not have a column name, it only has one overall name

In [8]:
pd.Series([30, 35, 40], index=['2015 Sales', '2016 Sales', '2017 Sales'], name='Product A')

2015 Sales    30
2016 Sales    35
2017 Sales    40
Name: Product A, dtype: int64

# Read a data files 

Data can be stored in any of a number of different forms and formats. By far the most basic of these is the humble CSV file. So a CSV file is a table of values separated by commas. Hence the name: "Comma-Separated Values", or CSV.

In [9]:
survey = pd.read_csv("survey.csv")

we use shape to attribute to know the size of Data Frame

In [10]:
survey.shape

(32445, 10)

We use head method to know the first 5 row of a Data Frame

In [11]:
survey.head()

Unnamed: 0,Year,Industry_aggregation_NZSIOC,Industry_code_NZSIOC,Industry_name_NZSIOC,Units,Variable_code,Variable_name,Variable_category,Value,Industry_code_ANZSIC06
0,2019,Level 1,99999,All industries,Dollars (millions),H01,Total income,Financial performance,728239,ANZSIC06 divisions A-S (excluding classes K633...
1,2019,Level 1,99999,All industries,Dollars (millions),H04,"Sales, government funding, grants and subsidies",Financial performance,643809,ANZSIC06 divisions A-S (excluding classes K633...
2,2019,Level 1,99999,All industries,Dollars (millions),H05,"Interest, dividends and donations",Financial performance,62924,ANZSIC06 divisions A-S (excluding classes K633...
3,2019,Level 1,99999,All industries,Dollars (millions),H07,Non-operating income,Financial performance,21505,ANZSIC06 divisions A-S (excluding classes K633...
4,2019,Level 1,99999,All industries,Dollars (millions),H08,Total expenditure,Financial performance,634710,ANZSIC06 divisions A-S (excluding classes K633...
