# Pandas
- Solve short hands-on challenges to perfect your data manipulation skills.
- https://www.kaggle.com/learn/pandas

## Creating, Reading and Writing
- You can't work with data if you can't read it. Get started here.

In [1]:
import pandas as pd

pd.__version__

'1.5.3'

### Creating Data

#### DataFrames
A DataFrame is a table. It contains an array of individual entries, each of which has a certain value. Each entry corresponds to a row (or record) and a column.

In [2]:
# DataFrame (dict-list) constructor
df = pd.DataFrame({'Yes': [50, 21], 'No': [131, 2]})
df

Unnamed: 0,Yes,No
0,50,131
1,21,2


In [3]:
df_str = pd.DataFrame({'Bob': ['I liked it.', 'It was awful.'], 'Sue': ['Pretty good.', 'Bland.']})
df_str

Unnamed: 0,Bob,Sue
0,I liked it.,Pretty good.
1,It was awful.,Bland.


In [4]:
# Non-default Index with index parameter in the constructor
pd.DataFrame({'Bob': ['I liked it.', 'It was awful.'], 
              'Sue': ['Pretty good.', 'Bland.']},
             index=['Product A', 'Product B'])

Unnamed: 0,Bob,Sue
Product A,I liked it.,Pretty good.
Product B,It was awful.,Bland.


#### Series
A Series, by contrast, is a sequence of data values. If a DataFrame is a table, a Series is a list. And in fact you can create one with nothing more than a list:

In [5]:
s = pd.Series([1, 2, 3, 4, 5])
s

0    1
1    2
2    3
3    4
4    5
dtype: int64

In [7]:
# Series with row names using index parm, and an overall Series name
s1 = pd.Series([30, 35, 40],
               index=['2015 Sales', '2016 Sales', '2017 Sales'],
               name='Product A')

s1

2015 Sales    30
2016 Sales    35
2017 Sales    40
Name: Product A, dtype: int64

### Reading data files

In [8]:
redwine_df = pd.read_csv('Red.csv')
redwine_df

Unnamed: 0,Name,Country,Region,Winery,Rating,NumberOfRatings,Price,Year
0,Pomerol 2011,France,Pomerol,Château La Providence,4.2,100,95.00,2011
1,Lirac 2017,France,Lirac,Château Mont-Redon,4.3,100,15.50,2017
2,Erta e China Rosso di Toscana 2015,Italy,Toscana,Renzo Masi,3.9,100,7.45,2015
3,Bardolino 2019,Italy,Bardolino,Cavalchina,3.5,100,8.72,2019
4,Ried Scheibner Pinot Noir 2016,Austria,Carnuntum,Markowitsch,3.9,100,29.15,2016
...,...,...,...,...,...,...,...,...
8661,6th Sense Syrah 2016,United States,Lodi,Michael David Winery,3.8,994,16.47,2016
8662,Botrosecco Maremma Toscana 2016,Italy,Maremma Toscana,Le Mortelle,4.0,995,20.09,2016
8663,Haut-Médoc 2010,France,Haut-Médoc,Château Cambon La Pelouse,3.7,996,23.95,2010
8664,Shiraz 2019,Australia,South Eastern Australia,Yellow Tail,3.5,998,6.21,2019


In [9]:
redwine_df.shape

(8666, 8)

In [23]:
# function to return a multiline string with a simple 2x3 table of values of large tables
# I will use this function to present df rows, cols & entries values.

def mk_table(rows, cols):
    line = '-' * (1 + 9 * 3 + 13) + '\n'
    titln = f"| {'rows':^9} | {'columns':^9} | {'entries':^13} |\n"
    valsln = f"| {rows:>9,} | {cols:>9} | {(rows * cols):>13,} |\n"
    return line + titln + line + valsln + line

print(mk_table(129971, 14))     # winmag-data-130k-v2.csv dataset case

-----------------------------------------
|   rows    |  columns  |    entries    |
-----------------------------------------
|   129,971 |        14 |     1,819,594 |
-----------------------------------------



In [22]:
print(mk_table(redwine_df.shape[0], redwine_df.shape[1]))

-----------------------------------------
|   rows    |  columns  |    entries    |
-----------------------------------------
|     8,666 |         8 |        69,328 |
-----------------------------------------



In [25]:
display(redwine_df.head())
display(redwine_df.tail())

Unnamed: 0,Name,Country,Region,Winery,Rating,NumberOfRatings,Price,Year
0,Pomerol 2011,France,Pomerol,Château La Providence,4.2,100,95.0,2011
1,Lirac 2017,France,Lirac,Château Mont-Redon,4.3,100,15.5,2017
2,Erta e China Rosso di Toscana 2015,Italy,Toscana,Renzo Masi,3.9,100,7.45,2015
3,Bardolino 2019,Italy,Bardolino,Cavalchina,3.5,100,8.72,2019
4,Ried Scheibner Pinot Noir 2016,Austria,Carnuntum,Markowitsch,3.9,100,29.15,2016


Unnamed: 0,Name,Country,Region,Winery,Rating,NumberOfRatings,Price,Year
8661,6th Sense Syrah 2016,United States,Lodi,Michael David Winery,3.8,994,16.47,2016
8662,Botrosecco Maremma Toscana 2016,Italy,Maremma Toscana,Le Mortelle,4.0,995,20.09,2016
8663,Haut-Médoc 2010,France,Haut-Médoc,Château Cambon La Pelouse,3.7,996,23.95,2010
8664,Shiraz 2019,Australia,South Eastern Australia,Yellow Tail,3.5,998,6.21,2019
8665,Portillo Cabernet Sauvignon 2016,Argentina,Tunuyán,Salentein,3.4,999,7.88,2016


In [26]:
redwine_df1 = pd.read_csv('Red.csv', index_col=0)
redwine_df1

Unnamed: 0_level_0,Country,Region,Winery,Rating,NumberOfRatings,Price,Year
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
Pomerol 2011,France,Pomerol,Château La Providence,4.2,100,95.00,2011
Lirac 2017,France,Lirac,Château Mont-Redon,4.3,100,15.50,2017
Erta e China Rosso di Toscana 2015,Italy,Toscana,Renzo Masi,3.9,100,7.45,2015
Bardolino 2019,Italy,Bardolino,Cavalchina,3.5,100,8.72,2019
Ried Scheibner Pinot Noir 2016,Austria,Carnuntum,Markowitsch,3.9,100,29.15,2016
...,...,...,...,...,...,...,...
6th Sense Syrah 2016,United States,Lodi,Michael David Winery,3.8,994,16.47,2016
Botrosecco Maremma Toscana 2016,Italy,Maremma Toscana,Le Mortelle,4.0,995,20.09,2016
Haut-Médoc 2010,France,Haut-Médoc,Château Cambon La Pelouse,3.7,996,23.95,2010
Shiraz 2019,Australia,South Eastern Australia,Yellow Tail,3.5,998,6.21,2019


In [27]:
print(mk_table(redwine_df1.shape[0], redwine_df1.shape[1]))

-----------------------------------------
|   rows    |  columns  |    entries    |
-----------------------------------------
|     8,666 |         7 |        60,662 |
-----------------------------------------

