# Housing Market

### Introduction:

This time we will create our own dataset with fictional numbers to describe a house market. As we are going to create random data don't try to find reason within the numbers.

### Step 1. Import the necessary libraries

In [1]:
import pandas as pd
import numpy as np

### Step 2. Create 3 differents Series, each of length 100, as follows: 
1. The first a random number from 1 to 4 
2. The second a random number from 1 to 3
3. The third a random number from 10,000 to 30,000

In [4]:
s1 = pd.Series(np.random.randint(1, high=5, size=100, dtype='l'))
s2 = pd.Series(np.random.randint(1, high=4, size=100, dtype='l'))
s3 = pd.Series(np.random.randint(10000, high=30001, size=100, dtype='l'))

# Init signature: pd.Series(self, data=None, index=None, dtype=None, name=None, copy=False, fastpath=False)
# Docstring:
# One-dimensional ndarray with axis labels (including time series).

# Labels need not be unique but must be any hashable type. The object
# supports both integer- and label-based indexing and provides a host of
# methods for performing operations involving the index. Statistical
# methods from ndarray have been overridden to automatically exclude
# missing data (currently represented as NaN)

# Operations between Series (+, -, /, *, **) align values based on their
# associated index values-- they need not be the same length. The result
# index will be the sorted union of the two indexes.

# Parameters
# ----------
# data : array-like, dict, or scalar value
#     Contains data stored in Series
# index : array-like or Index (1d)
#     Values must be unique and hashable, same length as data. Index
#     object (or other iterable of same length as data) Will default to
#     RangeIndex(len(data)) if not provided. If both a dict and index
#     sequence are used, the index will override the keys found in the
#     dict.
# dtype : numpy.dtype or None

print s1.head(10), s2.head(10), s3.head(10)

0    3
1    2
2    1
3    3
4    3
5    3
6    3
7    2
8    4
9    4
dtype: int64 0    1
1    3
2    3
3    2
4    3
5    1
6    2
7    3
8    3
9    2
dtype: int64 0    14310
1    24632
2    19317
3    10108
4    13767
5    23748
6    17814
7    25773
8    25990
9    26418
dtype: int64


### Step 3. Let's create a DataFrame by joinning the Series by column

In [5]:
housemkt = pd.concat([s1, s2, s3], axis=1)
housemkt.head(10)

Unnamed: 0,0,1,2
0,3,1,14310
1,2,3,24632
2,1,3,19317
3,3,2,10108
4,3,3,13767
5,3,1,23748
6,3,2,17814
7,2,3,25773
8,4,3,25990
9,4,2,26418


### Step 4. Change the name of the columns to bedrs, bathrs, price_sqr_meter

In [6]:
housemkt.rename(columns = {0: 'bedrs', 1: 'bathrs', 2: 'price_sqr_meter'}, inplace=True)

# Signature: housemkt.rename(index=None, columns=None, **kwargs)
# Docstring:
# Alter axes input function or functions. Function / dict values must be
# unique (1-to-1). Labels not contained in a dict / Series will be left
# as-is. Alternatively, change ``Series.name`` with a scalar
# value (Series only).

# Parameters
# ----------
# index, columns : scalar, list-like, dict-like or function, optional
#     Scalar or list-like will alter the ``Series.name`` attribute,
#     and raise on DataFrame or Panel.
#     dict-like or functions are transformations to apply to
#     that axis' values
# copy : boolean, default True
#     Also copy underlying data
# inplace : boolean, default False
#     Whether to return a new DataFrame. If True then value of copy is
#     ignored.

# Returns
# -------
# renamed : DataFrame (new object)

# See Also
# --------
# pandas.NDFrame.rename_axis

# Examples
# --------
# >>> s = pd.Series([1, 2, 3])
# >>> s
# 0    1
# 1    2
# 2    3
# dtype: int64
# >>> s.rename("my_name") # scalar, changes Series.name
# 0    1
# 1    2
# 2    3
# Name: my_name, dtype: int64
# >>> s.rename(lambda x: x ** 2)  # function, changes labels
# 0    1
# 1    2
# 4    3
# dtype: int64
# >>> s.rename({1: 3, 2: 5})  # mapping, changes labels
# 0    1
# 3    2
# 5    3
# dtype: int64
# >>> df = pd.DataFrame({"A": [1, 2, 3], "B": [4, 5, 6]})
# >>> df.rename(2)
# ...
# TypeError: 'int' object is not callable
# >>> df.rename(index=str, columns={"A": "a", "B": "c"})
#    a  c
# 0  1  4
# 1  2  5
# 2  3  6

housemkt.head()

Unnamed: 0,bedrs,bathrs,price_sqr_meter
0,3,1,14310
1,2,3,24632
2,1,3,19317
3,3,2,10108
4,3,3,13767


### Step 5. Create a one column DataFrame with the values of the 3 Series and assign it to 'bigcolumn'

In [8]:
# join concat the values
bigcolumn = pd.concat([s1, s2, s3], axis=0)

# it is still a Series, so we need to transform it to a DataFrame
bigcolumn = bigcolumn.to_frame()

# Series.to_frame(name=None)[source]
# Convert Series to DataFrame

# Parameters:	
# name : object, default None
# The passed name should substitute for the series name (if it has one).
# Return:
# data_frame : DataFrame

print type(bigcolumn)

bigcolumn

<class 'pandas.core.frame.DataFrame'>


Unnamed: 0,0
0,3
1,2
2,1
3,3
4,3
5,3
6,3
7,2
8,4
9,4


### Step 6. Oops -- it seems it is going only until index 99. Is that true?

In [9]:
# No, the indices are kept but the length of the DataFrame is 300
len(bigcolumn)

300

### Step 7. Reindex the DataFrame so it goes from 0 to 299

In [10]:
bigcolumn.reset_index(drop=True, inplace=True)

# Signature: bigcolumn.reset_index(level=None, drop=False, inplace=False, col_level=0, col_fill='')
# Docstring:
# For DataFrame with multi-level index, return new DataFrame with
# labeling information in the columns under the index names, defaulting
# to 'level_0', 'level_1', etc. if any are None. For a standard index,
# the index name will be used (if set), otherwise a default 'index' or
# 'level_0' (if 'index' is already taken) will be used.

# Parameters
# ----------
# level : int, str, tuple, or list, default None
#     Only remove the given levels from the index. Removes all levels by
#     default
# drop : boolean, default False
#     Do not try to insert index into dataframe columns. This resets
#     the index to the default integer index.
# inplace : boolean, default False
#     Modify the DataFrame in place (do not create a new object)
# col_level : int or str, default 0
#     If the columns have multiple levels, determines which level the
#     labels are inserted into. By default it is inserted into the first
#     level.
# col_fill : object, default ''
#     If the columns have multiple levels, determines how the other
#     levels are named. If None then the index name is repeated.

# Returns
# -------
# resetted : DataFrame

bigcolumn

Unnamed: 0,0
0,3
1,2
2,1
3,3
4,3
5,3
6,3
7,2
8,4
9,4
