# What's Tabular Data?

It is a sort of tabular data which is organized into rows and columns following the principles of "tidy data." 
- Each row corresponds to a single observation.
- Each column corresponds to a single feature of observations
- Each cell contains only one value.

## Lists

In [1]:
house_0_list = [115910.26, 128, 4]
house_0_list

[115910.26, 128, 4]

## Accessing List Items

In [2]:
house_0_price_m2 = house_0_list[0] / house_0_list[1]
house_0_price_m2

905.54890625

## `Append()` Method

We use `append()` method to add an item to the end of the list

In [3]:
house_0_list.append(house_0_price_m2)
house_0_list

[115910.26, 128, 4, 905.54890625]

## Nested List

List inside list is called nested list

In [4]:
houses_nested_list = [
    [115910.26, 128.0, 4.0],
    [48718.17, 210.0, 3.0],
    [28977.56, 58.0, 2.0],
    [36932.27, 79.0, 3.0],
    [83903.51, 111.0, 3.0],
]

houses_nested_list

[[115910.26, 128.0, 4.0],
 [48718.17, 210.0, 3.0],
 [28977.56, 58.0, 2.0],
 [36932.27, 79.0, 3.0],
 [83903.51, 111.0, 3.0]]

## `for` loop

A `for` Loop is used for executing a set of statements for each item in a list.

In [5]:

houses_nested_list

for house in houses_nested_list:
    house_m2 = house[0] / house[1]
    house.append(house_m2)

houses_nested_list    

[[115910.26, 128.0, 4.0, 905.54890625],
 [48718.17, 210.0, 3.0, 231.9912857142857],
 [28977.56, 58.0, 2.0, 499.61310344827587],
 [36932.27, 79.0, 3.0, 467.4970886075949],
 [83903.51, 111.0, 3.0, 755.8874774774774]]

## Dictionaries

Dictionary is a data structure where each value is associated with a key.

In [6]:
house_0_dict = {
    "price_aprox_usd": 115910.26,
    "surface_covered_in_m2": 128,
    "rooms": 4,
}

house_0_dict

{'price_aprox_usd': 115910.26, 'surface_covered_in_m2': 128, 'rooms': 4}

## Accessing Dictionary Items

Dictionary items can be accessed through their keys.

In [7]:
house_0_dict["price_per_m2"] = house_0_dict["price_aprox_usd"] / house_0_dict["surface_covered_in_m2"]
house_0_dict

{'price_aprox_usd': 115910.26,
 'surface_covered_in_m2': 128,
 'rooms': 4,
 'price_per_m2': 905.54890625}

## List of Dictionaries

We can create list of dictionaries. It is basically a list, where each item is a dictionary. 

In [8]:
houses_rowwise = [
    {
        "price_aprox_usd": 115910.26,
        "surface_covered_in_m2": 128,
        "rooms": 4,
    },
    {
        "price_aprox_usd": 48718.17,
        "surface_covered_in_m2": 210,
        "rooms": 3,
    },
    {
        "price_aprox_usd": 28977.56,
        "surface_covered_in_m2": 58,
        "rooms": 2,
    },
    {
        "price_aprox_usd": 36932.27,
        "surface_covered_in_m2": 79,
        "rooms": 3,
    },
    {
        "price_aprox_usd": 83903.51,
        "surface_covered_in_m2": 111,
        "rooms": 3,
    },
]

houses_rowwise

[{'price_aprox_usd': 115910.26, 'surface_covered_in_m2': 128, 'rooms': 4},
 {'price_aprox_usd': 48718.17, 'surface_covered_in_m2': 210, 'rooms': 3},
 {'price_aprox_usd': 28977.56, 'surface_covered_in_m2': 58, 'rooms': 2},
 {'price_aprox_usd': 36932.27, 'surface_covered_in_m2': 79, 'rooms': 3},
 {'price_aprox_usd': 83903.51, 'surface_covered_in_m2': 111, 'rooms': 3}]

## Iterating over List of Dictionaries

In [9]:

for house in houses_rowwise:
    house["price_per_m2"] = house["price_aprox_usd"] / house["surface_covered_in_m2"]
#     print(house)
houses_rowwise

[{'price_aprox_usd': 115910.26,
  'surface_covered_in_m2': 128,
  'rooms': 4,
  'price_per_m2': 905.54890625},
 {'price_aprox_usd': 48718.17,
  'surface_covered_in_m2': 210,
  'rooms': 3,
  'price_per_m2': 231.9912857142857},
 {'price_aprox_usd': 28977.56,
  'surface_covered_in_m2': 58,
  'rooms': 2,
  'price_per_m2': 499.61310344827587},
 {'price_aprox_usd': 36932.27,
  'surface_covered_in_m2': 79,
  'rooms': 3,
  'price_per_m2': 467.4970886075949},
 {'price_aprox_usd': 83903.51,
  'surface_covered_in_m2': 111,
  'rooms': 3,
  'price_per_m2': 755.8874774774774}]

## `JSON`

- The printed format above is a list of `JSON` object
- Each object is enclosed in `{}`
- Each object completes a record/ observation

## Calculating Mean from Observations

In [10]:
house_prices = []
for house in houses_rowwise:
    house_prices.append(house['price_aprox_usd'])
    
mean_house_price = sum(house_prices) / len(house_prices)

mean_house_price

62888.35399999999

For easier calculations:
- Organize data by features (columns) instead of observations (rows)
- This way homogeneity/ same-type of data is maintained for all entries
- Direct calculation is possible
- A dictionary key (column name) contains a list of similar features for all observations.

In [11]:
houses_columnwise = {
    "price_aprox_usd": [115910.26, 48718.17, 28977.56, 36932.27, 83903.51],
    "surface_covered_in_m2": [128.0, 210.0, 58.0, 79.0, 111.0],
    "rooms": [4.0, 3.0, 2.0, 3.0, 3.0],
}

houses_columnwise

{'price_aprox_usd': [115910.26, 48718.17, 28977.56, 36932.27, 83903.51],
 'surface_covered_in_m2': [128.0, 210.0, 58.0, 79.0, 111.0],
 'rooms': [4.0, 3.0, 2.0, 3.0, 3.0]}

## Calculating Mean from Features

In [12]:
mean_house_price = sum(houses_columnwise['price_aprox_usd']) / len(houses_columnwise['price_aprox_usd'])

mean_house_price

62888.35399999999

## `Zip`

It helps to take every n-th item across multiple lists together 

In [13]:
price_per_m2 = []
for price, surface in zip(houses_columnwise['price_aprox_usd'] ,houses_columnwise['surface_covered_in_m2']):
    price_per_m2.append(price / surface)
houses_columnwise['price_per_m2'] = price_per_m2    
houses_columnwise

{'price_aprox_usd': [115910.26, 48718.17, 28977.56, 36932.27, 83903.51],
 'surface_covered_in_m2': [128.0, 210.0, 58.0, 79.0, 111.0],
 'rooms': [4.0, 3.0, 2.0, 3.0, 3.0],
 'price_per_m2': [905.54890625,
  231.9912857142857,
  499.61310344827587,
  467.4970886075949,
  755.8874774774774]}

# Tabular Data and pandas DataFrames

One of the best known data science libraries is `pandas`, which allows us to organize data into `DataFrames`. `DataFrames` are tabular data structure like spreadsheet.

In [14]:
import pandas as pd

data = {
    "price_aprox_usd": [115910.26, 48718.17, 28977.56, 36932.27, 83903.51],
    "surface_covered_in_m2": [128.0, 210.0, 58.0, 79.0, 111.0],
    "rooms": [4.0, 3.0, 2.0, 3.0, 3.0],
}

df_houses = pd.DataFrame(data)

df_houses

Unnamed: 0,price_aprox_usd,surface_covered_in_m2,rooms
0,115910.26,128.0,4.0
1,48718.17,210.0,3.0
2,28977.56,58.0,2.0
3,36932.27,79.0,3.0
4,83903.51,111.0,3.0
