# Introduction

Pandas DataFrames are two-dimensional data structures with labeled rows and columns, that can hold many data types. If you are familiiar with Excel, you can think of Pandas DataFrames as being similar to a spreadsheet. We can create Pandas DataFrames manually o by loading data from a file. In this lesson, we will start by learning how to create Pandas DataFrames manually from dictionaries, and later we will see how we can load data into a DataFrame from a data file. 

## Create a DataFrame manually

We will start by creating a DataFrame manually from a directory of Pandas Series. It is a two-step process:

1. The first step is to create the dictionay of Pandas Series.
2. After the dictionary is created we can pass the dictionary to the `pd.DataFrame()` function.

We will create a dictionary that contains items purchased by two people, Alice and Bob, o an online store. The Pandas Series will use the price of the items purchased as *data*, and the purchased items will be used as the *index* labels to the Pandas Series. Let's see how this is done in code

In [2]:
# import Pandas as pd into Pyhton
import pandas as pd

# create a dictionary of Pandas Series
items = {'Bob':pd.Series(index=['bike','pants','watch'], data=[25,25,55]), 
         'Alice':pd.Series(index=['book','glasses','bike','pants'], data=[40,110,500,45])}

# print the type of items to see that it is a dictionary
print(type(items))

<class 'dict'>


Now that we have a dictionary, we are ready to create a DataFrame by passing it to the `pd.DataFrame()` function. We will create DataFrame that could represent the shopping carts of various users, in this case we have only two users, Alice and Bob. 

#### Example1. Create DataFrame using a dictionary of Series

In [4]:
# Create a Pandas DataFrame by passing it a dictionary of Pandas Series
shopping_carts = pd.DataFrame(items)

# Display the DataFrame
shopping_carts

Unnamed: 0,Bob,Alice
bike,25.0,500.0
book,,40.0
glasses,,110.0
pants,25.0,45.0
watch,55.0,


There are several things to notice here, as explained below:
1. We see that DataFrames are displayed in tabular form, much like an Excel spreadsheet, with the lables of rows and columns in **bold**.
2. Also, notice that the row labels fo the DataFrame are built from the union of the index labels of the two Pandas Series we used to construct the dictionary. And the column labels of the DataFrame are taken from the *keys* of the dictionary.
3. Another thinkg to notice is that the columns are arranged alphabetically and not in the order given in the dictionary. We will see later that this won't happen when we load data into a DataFrame from a data file. 
4. The last thing we want to point is that we see some `NaN` values appear in the DataFrame. `NaN` stands for *Not a Number*, and is Pandas way of indicating that it doesn't have a value for that particular row and column index. For example, if we look a the column of Aliice, we see that it has `NaN` in the watch index. you can see why this is the case by looking aht the dictionary we created at the beginning. We clearly see that the dictionary has no item for Alice labeled watches. So whenever a DataFrame is created, if a particular column doesn't have values of a particular row index, Pandas will put a `NaN` value there. 
5. If we were to feed this data into a machine learning algorith we will have to remove these `NaN` values first. In a later lesson, we will learn how to deal with `NaN` values and clea our data. For now, we will leave these values in our DataFrame.

In the exmple above, we created a Pandas DataFrame from a dictionary of Pandas Series that had clearly defined indexes. If we don't provide index labels to the Pandas Series, Pandas will use numerical row indexes when it creates the DataFrame. Letsee an example

#### Example 2. DataFrame assigns the numerical row indexes by default

In [5]:
# create a dictionary of Pandas Series without indexes
data = {'Bob': pd.Series([245,25,55]),
        'Alice': pd.Series([40,110, 500,45])}

# create a DataFrame
df = pd.DataFrame(data)

# display the DataFrame
df

Unnamed: 0,Bob,Alice
0,245.0,40
1,25.0,110
2,55.0,500
3,,45


We can see that pandas indexes the rows of the DataFrame starting from 0, just like NumPy indexes ndarrays. 

Now, just like with Pandas Series we can also extract information from DataFrames using attributes. Let's print some information from our `shopping_carts` DataFrame

#### Example 3. Demonstrate a few attributes of DataFrame

In [6]:
# print some information about shopping_carts
print('shopping_carts has shape: ', shopping_carts.shape)
print('shopping_carts has dimension: ', shopping_carts.ndim)
print('shopping_cars has a totoal of: ', shopping_carts.size, ' elements')
print()
print('The data in the shopping_carts is: \n', shopping_carts.values)
print()
print('The row index in shopping_carts is : \n', shopping_carts.index)
print()
print('The column index in shopping_carts is : \n', shopping_carts.columns)

shopping_carts has shape:  (5, 2)
shopping_carts has dimension:  2
shopping_cars has a totoal of:  10  elements

The data in the shopping_carts is: 
 [[ 25. 500.]
 [ nan  40.]
 [ nan 110.]
 [ 25.  45.]
 [ 55.  nan]]

The row index in shopping_carts is : 
 Index(['bike', 'book', 'glasses', 'pants', 'watch'], dtype='object')

The column index in shopping_carts is : 
 Index(['Bob', 'Alice'], dtype='object')


When creating the `shopping_carts` DataFrame we passed the entire dictionary to the `pd.DataFrame()` function. However, there might be cases when you are only interested in a subset of the data. Pandas allows us to select which data we want to put into our DataFrame by means of the keywords `columns` and `index`. Let's see some examples

In [7]:
# create a DataFrame that only has Bob's data
bob_shipping_cart = pd.DataFrame(items,columns=['Bob'])

# display bob_shopping_cart
bob_shipping_cart

NameError: name 'bob_shopping_cart' is not defined