In [2]:
import pandas as pd
myseries = pd.Series(
    [10,20,30],
    index = ["a","b","c"]
)
print(myseries)

a    10
b    20
c    30
dtype: int64


In [4]:
myseries2 = pd.Series(
    ["Jane","John","Emily","Matt"]
)
print(myseries2[1])

John


In [7]:
myseries3 = pd.Series([1,2,3,2])
print(myseries3.is_unique)

False


## Testing for two-dimensional (DataFrame)

In [10]:
df = pd.DataFrame({
    "Name":["Jane","John","Matt","Ashley"],
    "Age":[24,21,26,32]
})
print(df.shape)

(4, 2)


### Read CSV file

In [12]:
sales = pd.read_csv("sales.csv")
print(sales.head())
print(sales.shape)

   product_code product_group  stock_qty    cost    price  last_week_sales   
0          4187           PG2        498  420.76   569.91               13  \
1          4195           PG2        473  545.64   712.41               16   
2          4204           PG2        968  640.42   854.91               22   
3          4219           PG2        241  869.69  1034.55               14   
4          4718           PG2       1401   12.54    26.59               50   

   last_month_sales  
0                58  
1                58  
2                88  
3                45  
4               285  
(1000, 7)


### usecols to only read certain columns
### nrows to limit the rows

In [16]:
sales1 = pd.read_csv("sales.csv", usecols=["product_code","cost","price"], nrows=200)
print(sales1.shape)
print(sales1.head())

(200, 3)
   product_code    cost    price
0          4187  420.76   569.91
1          4195  545.64   712.41
2          4204  640.42   854.91
3          4219  869.69  1034.55
4          4718   12.54    26.59


## Python dictionary
One of the most commonly used methods to do so is with a Python dictionary. We just pass a dictionary to the DataFrame constructor.

In [17]:
df = pd.DataFrame({
  "Names": ["Jane", "John", "Matt", "Ashley"],
  "Ages": [26, 24, 28, 25],
  "Score": [91.2, 94.1, 89.5, 92.3]
})

print(df)

    Names  Ages  Score
0    Jane    26   91.2
1    John    24   94.1
2    Matt    28   89.5
3  Ashley    25   92.3


## Two-dimensional array
DataFrame is a two-dimensional data structure that consists of rows and columns. Thus, we can convert a two-dimensional array into a DataFrame. For instance, the DataFrame constructor accepts NumPy arrays.

The code below creates a DataFrame using a NumPy array. By default, column names are assigned integer index, but we can change it using the columns parameter.

In [23]:
import numpy as np
import pandas as pd

arr = np.random.randint(1, 10, size=(3,5))
df = pd.DataFrame(arr, columns=["A","B","C","D","E"])
print(df)

arr2 = np.random.randint(-11,11,size=(10,3))
df2 = pd.DataFrame(arr2, columns=["X","Y","Z"])
print(df2)


   A  B  C  D  E
0  5  5  3  3  6
1  9  1  8  8  9
2  7  2  8  6  4
    X   Y   Z
0 -10  -5  -5
1  -7  -7   6
2   7   9  -3
3   9   8   6
4   9   0 -10
5   0   7   5
6   2  -1 -11
7   4  -4  -6
8 -10 -11   1
9  -5  -4  10


# Size of a data frame
## The size, shape, and len menthods
We should always check the data’s size before analyzing it. The size of a DataFrame can be expressed in terms of the number of rows and columns.

The **shape** method returns a tuple that contains the number of rows and columns. 
The **size** method contains a number that shows the number of rows multiplied by the number of columns. Thus, it returns the total number of cells in a DataFrame. 
The built-in Python function, **len** gives us the number of rows in a DataFrame. Let’s check the size of the sales using these methods.

In [25]:
import pandas as pd
sales = pd.read_csv("sales.csv")
print(sales.shape)
print(sales.size)
print(len(sales))

print(sales.info())

(1000, 7)
7000
1000
