In [1]:
##  1.1. Organizing Tabular Data in Python

Information can come in many forms, and part of a data scientist's job is making sure that information is organized in a way that's conducive to analysis. Take for example these five houses from the Mexico real estate dataset we'll use in this project:

![Five houses showing price, number of rooms, and area in square meters for each](../images/proj-1.001.png)

One common way to organize this information is in a **table**, which is a group of **cells** organized into **rows** and **columns**:

![Table, organized into rows and columns, with housing information from previous image](../images/proj-1.002.png)

When working with this sort of **tabular data**, it's important to organize row and columns following the principles of "**[tidy data](https://en.wikipedia.org/wiki/Tidy_data)**." What does that mean in the case of our dataset?

1. Each row corresponds to a single house in our dataset. We'll call each of these houses an **observation**.
2. Each column corresponds to a characteristic of each house. We'll call these **features**.
3. Each cell contains only one **value**. 

![Three copies of table from previous image, emphasizing observations as rows and features as columns](../images/proj-1.003.png)

So whenever you encounter a new dataset, make sure your data is "tidy."

# Tabular Data and Python Data Structures

## Working with Lists

Python comes with several data structures that we can use to organize tabular data. Let's start by putting a single observation in a **list**.

In [1]:
# Declare variable `house_0_list`
house_0_list = [115910.26, 128, 4]

# Print object type of `house_0_list`
# (We'll learn more about object types in later projects 😉)
print("house_0_list type:", type(house_0_list))

# Print length of `house_0_list`
print("house_0_list length:", len(house_0_list))

# Get output of `house_0_list`
house_0_list

house_0_list type: <class 'list'>
house_0_list length: 3


[115910.26, 128, 4]

In [None]:
# Append price / sq. meter to `house_0_list`
house_0_list.append(house_0_price_m2)

# Print object type of `house_0_list`
print("house_0_list type:", type(house_0_list))

# Print length of `house_0_list`
print("house_0_list length:", len(house_0_list))

# Get output of `house_0_list`
house_0_list

In [None]:
Now that you can work with data for a single house, let's think about how to organize the whole dataset. One option would be to create a list for each observation and then put those together in another list. This is called a [**nested list**](http://127.0.0.1:8888/lab/tree/work/ds_curriculum/%40textbook/Python.ipynb#creating-lists).

In [None]:
# Declare variable `houses_nested_list`
houses_nested_list = [
    [115910.26, 128.0, 4.0],
    [48718.17, 210.0, 3.0],
    [28977.56, 58.0, 2.0],
    [36932.27, 79.0, 3.0],
    [83903.51, 111.0, 3.0],
]

# Print `houses_nested_list` type
print("houses_nested_list type:", type(houses_nested_list))

# Print `houses_nested_list` length
print("houses_nested_list length:", len(houses_nested_list))

# Get output of `houses_nested_list`
houses_nested_list

In [None]:
# Create for loop to iterate through `houses_nested_list`

    # For each observation, append price / sq. meter
    
for house in houses_nested_list:
    price_m2= house[0] / house[1]
    #print(price_m2)
    house.append(price_m2)

# Print `houses_nested_list` type
print("houses_nested_list type:", type(houses_nested_list))

# Print `houses_nested_list` length
print("houses_nested_list length:", len(houses_nested_list))

# Get output of `houses_nested_list`
houses_nested_list

In [None]:
# Declare variable `house_0_dict`
house_0_dict = {
    "price_approx_usd": 115910.26,
    "surface_covered_in_m2": 128,
    "rooms": 4,
}

# Print `house_0_dict` type
print("house_0_dict type:", type(house_0_dict))

# Get output of `house_0_dict`
house_0_dict

In [None]:
# Add "price_per_m2" key-value pair to `house_0_dict`
house_0_dict["price_per_m2"] = house_0_dict["price_approx_usd"]/house_0_dict["surface_covered_in_m2"]

# Get output of `house_0_dict`
house_0_dict

In [None]:
# Declare variable `houses_rowwise`
houses_rowwise = [
    {
        "price_approx_usd": 115910.26,
        "surface_covered_in_m2": 128,
        "rooms": 4,
    },
    {
        "price_approx_usd": 48718.17,
        "surface_covered_in_m2": 210,
        "rooms": 3,
    },
    {
        "price_approx_usd": 28977.56,
        "surface_covered_in_m2": 58,
        "rooms": 2,
    },
    {
        "price_approx_usd": 36932.27,
        "surface_covered_in_m2": 79,
        "rooms": 3,
    },
    {
        "price_approx_usd": 83903.51,
        "surface_covered_in_m2": 111,
        "rooms": 3,
    },
]

# Print `houses_rowwise` object type
print("houses_rowwise type:", type(houses_rowwise))

# Print `houses_rowwise` length
print("houses_rowwise length:", len(houses_rowwise))

# Get output of `houses_rowwise`
houses_rowwise

In [None]:
# Create for loop to iterate through `houses_rowwise`
for house in houses_rowwise:
    #print(house)
    house["price_per_m2"] = house["price_approx_usd"]/house["surface_covered_in_m2"]

    
    # For each observation, add "price_per_m2" key-value pair


# Print `houses_rowwise` object type
print("houses_rowwise type:", type(houses_rowwise))

# Print `houses_rowwise` length
print("houses_rowwise length:", len(houses_rowwise))

# Get output of `houses_rowwise`
houses_rowwise

In [None]:
# Declare `house_prices` as empty list
house_prices = []


# Iterate through `houses_rowwise`
for house in houses_rowwise:
    print(house["price_approx_usd"])
    house_prices.append(house["price_approx_usd"])
    # For each house, append "price_approx_usd" to `house_prices`

print(house_prices)
# Calculate `mean_house_price` using `house_prices`
mean_house_price = sum(house_prices) / len(house_prices)

# Print `mean_house_price` object type
print("mean_house_price type:", type(mean_house_price))

# Get output of `mean_house_price`
mean_house_price

In [None]:
# Declare variable `houses_columnwise`
houses_columnwise = {
    "price_approx_usd": [115910.26, 48718.17, 28977.56, 36932.27, 83903.51],
    "surface_covered_in_m2": [128.0, 210.0, 58.0, 79.0, 111.0],
    "rooms": [4.0, 3.0, 2.0, 3.0, 3.0],
}

# Print `houses_columnwise` object type
print("houses_columnwise type:", type(houses_columnwise))

# Get output of `houses_columnwise`
houses_columnwise

In [None]:
# Calculate `mean_house_price` using `houses_columnwise`
mean_house_price = sum(houses_columnwise["price_approx_usd"])/ len(houses_columnwise["price_approx_usd"])

# Print `mean_house_price` object type
print("mean_house_price type:", type(mean_house_price))

# Get output of `mean_house_price`
mean_house_price

In [None]:
# Add "price_per_m2" key-value pair for `houses_columnwise`
#houses_columnwise["price_per_m2"] = 
price = houses_columnwise["price_approx_usd"]
area = houses_columnwise["surface_covered_in_m2"]

price_per_m2 = []

for p, a in zip(price,area):
    #print("price:", p)
    #print("area:", a)
    price_m2 = p/a
    print(price_m2)
    price_per_m2.append(price_m2)
print(price_per_m2)
houses_columnwise["price_per_m2"] = price_per_m2
# Print `houses_columnwise` object type
print("houses_columnwise type:", type(houses_columnwise))

# Get output of `houses_columnwise`
houses_columnwise

In [None]:
# Import pandas library, aliased as `pd`
import pandas as pd

# Declare variable `df_houses`
df_houses = pd.DataFrame(houses_columnwise)

# Print `df_houses` object type
print("df_houses type:", type(df_houses))

# Print `df_houses` shape
print("df_houses shape:", df_houses.shape)

# Get output of `df_houses`
df_houses