# Introduction to NumPy and Pandas

## What is NumPy?

NumPy is a package in Python used for powerful, scalable computation. NumPy has a special array object that allows it store data in a smaller memory size and have faster access/edit times than ordinary Python list objects. It also has a large number of useful mathematical functions for linear algebra, fourier transforms, etc. We're going to mainly focus on the array objects today.

## Installation of Packages
Before we dive into Numpy, we need to first install the package and then import it. First, you want to make sure that you have pip3 installed. This is a tool that makes it really easy to install the numerous packages that are found on the Python Package Index. Chances are that your computer already has pip installed but you need pip3, since we're working in Python 3. 

Go to your terminal and type:

```terminal
sudo easy_install pip

```
After you enter your login password, you have successfully installed the most recent version of pip and pip3! Now anytime you want to install a package, you type in the terminal: 

```terminal
pip3 install insert_package_name_here
```
Or if you just want to do it within the Jupyter Notebook, just type: 
```terminal
!pip3 install insert_package_name_here
```
Try installing 'numpy' on your own!

## Importing Packages

You only have to install a new package once using terminal. However, the package is not automatically added into your program; you need to first import it. We are going to import NumPy's contents into our program. Anytime we use a NumPy array or function, we can call it using an abbreviated name like np instead of typing the whole name (numpy). 

In [2]:
# Note that there is no output when you import a package
import numpy as np

## NumPy Arrays
The NumPy array is a homogeneous, multidimensional array. Let's break that down. 

1. Homogeneous 
    * Every element in a NumPy array is of the same datatype. 
        * The result is that you can save much more memory than storing the data in a List
    * They can be all integers, all floats, or all strings. 
    * What happens if there's a mix of datatypes in your NumPy array? 
        * You'll experiment shortly and see what happens.

2. Multidimensional
    * You can have 1D NumPy arrays, 2D NumPy matrices, and more! 
    * Multidimensional NumPy arrays are much more easy to work with than multidimensional Python Lists, as you will see

3. Array
    * An array is an ordered group of elements stored in contiguous memory. 
    * Python actually does not have a native array data structure. 
        * Python only has lists which have its elements scattered all across memory!
    * NumPy arrays have their elements stored in one continuous block of memory. 
        * The result is much faster access to elements!
       
       

![NUMPY ARRAY VS. PYTHON LIST](array_vs_list.png)


For more information on how Numpy Arrays work "under the hood" check out its [documentation](https://docs.scipy.org/doc/numpy/reference/internals.html)

Jake VanderPlas, one of the core contributors to the machine learning package Sci-kit Learn, created the image above, and he wrote a really great article explaining why Python is generally slow, linked [here](https://jakevdp.github.io/blog/2014/05/09/why-python-is-slow/). 


Let's actually get started!

## Creating a NumPy Array

In [None]:
# There are many ways to create an array! 

# Creating an array from a list
lis = [[1,2,3],[4,5,6]]
arr = np.array(lis)
print(arr)
print(type(arr))

print("\n")  # \n creates a new line break
print() # same with just an empty print function

# Or creating an array using the arange function
arr2 = np.arange(8)
print(arr2)
arr2 = np.arange(1, 9)
print(arr2)
arr2 = np.arange(1, 9, 2) 
print(arr2)


The **shape** of a 2D array is the number of rows by the number of columns in the array. 

The shape is a property of a NumPy array object. That means you can get its value by the dot notation or a function


In [None]:
print("The shape of a 2D array is " + str(arr.shape))
# In the print statement, substitute 'arr.shape' for 'np.shape(arr)'


If we want to change the shape of an array, we can use the **reshape** function.

In [None]:
arr2 = np.arange(1,9).reshape(2,4)
print(arr2)
print()
arr2 = np.arange(1,9).reshape(2,-1) # What happens if one dimension is -1?
print(arr2)

You can use the **zeros** function to create an array of place-holder zeros with a certain shape

In [None]:
arr3 = np.zeros((3,5))
print(arr3)

### Explore:
 Use the next cell to briefly experiment with the shape of different 1D arrays and 3D arrays. 
 Also figure out what the properties 'size' and 'ndim' do without looking it up on the documentation. 

In [None]:
# Practice

In [None]:
# We can also stack arrays on top of each other or next to each other
x = np.arange(0,10,2)                     
y = np.arange(5)   
print(x)
print(y)
print()

# Dimensions of arguments must match exactly
xTopOfY = np.vstack([x,y])  
print(xTopOfY)
print() 
xNextToY = np.hstack([x,y])   
print(xNextToY)

## NumPy Array Datatypes

An important part of NumPy arrays are that every element is of the same data type.
We can use the property **dtype** to figure out what datatype it holds

In [3]:
arr = np.arange(1,9).reshape(2,4)
print(arr)
print("Every element in the array has datatype: " + str(arr.dtype))

[[1 2 3 4]
 [5 6 7 8]]
Every element in the array has datatype: int64


You can also force cast each element to be the same type by adding another argument

In [4]:
arr = np.array([[1,2], [3,4]], float)
print(arr)
print("Every element in the array has datatype: " + str(arr.dtype))

[[ 1.  2.]
 [ 3.  4.]]
Every element in the array has datatype: float64


### Explore:
Use the next cell to figure out what happens if you have an array with a mixture of types (float and int), (int and string), etc. What is the resulting datatype of the array? The behavior that you will observe is called **upcasting**. 

In [None]:
# Explore code here

# Iterating through NumPy Arrays

In [5]:
# Now let's iterate through a NumPy array
print(arr)
print("\n")

for row in arr:
    print(row)
print("\n")

for col in arr.T: # T means transpose (switching rows and columns)
    print(col)
print("\n")
 
for element in arr.flatten(): # flatten function basically takes multi-dim array and returns a 1D array representation
    print(element)


[[ 1.  2.]
 [ 3.  4.]]


[ 1.  2.]
[ 3.  4.]


[ 1.  3.]
[ 2.  4.]


1.0
2.0
3.0
4.0


Arithemtic operations on NumPy arrays are applied element to element

In [6]:
a = np.arange(5, 35, 5).reshape((2,3))
print(a, "\n")

b = np.array([[3, 2, 4], [8, 7, 14]])
print(b, "\n")

print(a-b)

print("\n")

print(a>15)

[[ 5 10 15]
 [20 25 30]] 

[[ 3  2  4]
 [ 8  7 14]] 

[[ 2  8 11]
 [12 18 16]]


[[False False False]
 [ True  True  True]]


## NumPy Indexing 

My favorite part of NumPy is the powerful indexing you can do when selecting in an array.

Creating a list of squares from 1 to 12 inclusive using **list comprehension** (abbreviated way of making a list)



In [7]:
# somelist = [5, 10, 15, 20, 25, 30]

squares = [(i+1)**2 for i in range(12)]
squares

[1, 4, 9, 16, 25, 36, 49, 64, 81, 100, 121, 144]

In Python, selecting squares with multiple indices is a bit complex

Say we want to print the first, fifth, ninth, and fourth elements in that order. 

In [8]:
print([squares[0], squares[4], squares[8], squares[3]]) 

[1, 25, 81, 16]


In NumPy, we can use an array of indices to select for certain values in an array.

In [9]:
squares_arr = np.array(squares)
indices = np.array([0,4,8,3]) # indices is just an array of integers
print(squares_arr[indices])

[ 1 25 81 16]


We can use **Boolean Indexing** to select squares_array values which are greater than 20

In [11]:
print(squares_arr>20)

# Boolean indexing is when you select only values that are True
print(squares_arr[squares_arr>20])


[False False False False  True  True  True  True  True  True  True  True]
[ 25  36  49  64  81 100 121 144]


If the values are all next to each other, we can use slicing to get the values too! 

In [12]:
print(squares_arr[2:7])

[ 9 16 25 36 49]


In [15]:
# squares_arr = squares_arr.reshape((3,4))
# print(squares_arr)
# print("\n")

# # Remember everything is 0 indexed
# # 3rd row, 3rd col
# print(squares_arr[2,2])
# print("\n")

# # printing all entries in the fourth column
# print(squares_arr[:,3])
# print()

# row_indices = np.array([0,2])
# col_indices = np.array([1,2])
# print(squares_arr[row_indices, col_indices])

# Try printing 49, 81, 16 in one list using the above array indexing technique
row_indices = np.array([1, 2, 0])
col_indices = np.array([2, 0, 3])
print(list(squares_arr[row_indices, col_indices]))

[49, 81, 16]


## Wrapping Up NumPy

We just finished up the basics of NumPy. Although we didn't cover all that NumPy has to offer, you can see that it's more powerful than a list in speed, memory, and flexibility. Now, we move onto Pandas, which is heavily based off NumPy. Having a good knowledge of NumPy is helpful to understanding how Pandas works. 

## What is Pandas? 
Pandas is a  package that makes it easy to use for data manipulation and analysis within Python through its use of two data structures: Series (1D) and DataFrames (2D). Pandas is fast and powerful because it is built on the NumPy arrays we learned about earlier. 

Let's start!

## Series

A **Series** is a one-dimensional NumPy array with indices. The data in the array can be of any type (integers, strings, dictionaries, etc.), and the indices should be unique values. In most cases, indices are strings, integers, or dates. Series are used to build DataFrames, which we'll talk about very soon.

Here we'll show a couple of ways to create a Series object. 

### Creating a Series from a list
In Pandas, the data type of strings, or values that contain characters and numbers, are called **objects**

If you have a mix of numbers and strings within a Series, then the datatypes of the Series will be objects

In [16]:
import pandas as pd

s1 = pd.Series(["First", "Second", "Third", "Fourth"])
print(s1)

0     First
1    Second
2     Third
3    Fourth
dtype: object


In [17]:
# Notice how the default indices are integers that start from 0, unless we specify it

s2 = pd.Series([1,2,3,4], index = ["a", "b", "c","d"], name="Sample")
print(s2)

a    1
b    2
c    3
d    4
Name: Sample, dtype: int64


### Creating a Series from a dictionary

When we have an index with no associated value, it will be assigned '**NaN**', not a number

This means this value is missing, since it doesn't exist

In [18]:
d = {'even1' : 2, 'even2' : 4, 'even3' : 6}
s3 = pd.Series(d, index=['even1', 'even2', 'even3', 'even4'])
print()
print(s3)


even1    2.0
even2    4.0
even3    6.0
even4    NaN
dtype: float64


Notice how there were only three items in the dictionary, but that I made four indices even though there wasn't a value for the fourth index. Pandas fills any unknown values with "NaN". This is important because you will likely have to deal with missing "NaN" values when cleaning datasets.

In [28]:
# print(s2[:3])
# print()
# print(s2[['a', 'c']])
# print()
print(s2[s2 > s2.mean()])

c    3
d    4
Name: Sample, dtype: int64


The way that Series differs from NumPy arrays is that Series always match on indices for operations. If an index doesn't exist for an operation, then you will get NaN, not a number. Pandas operates on the **union** of indices.

In [29]:
x = pd.Series([1,2,3,4], index = ["a", "b", "c","d"])
y = pd.Series([5,6,7,8], index = ["b", "c", "d","e"])
z = x + y # Adds based on the index 
print(z)

a     NaN
b     7.0
c     9.0
d    11.0
e     NaN
dtype: float64


## DataFrames
A DataFrame is a 2-Dimensional Pandas data structure with labeled rows and columns. Each row shares a common index value. Each column of a DataFrame is a Series itself. There are many, many ways of creating a DataFrame. We'll go over one way, and you'll learn more as you use Pandas. 

Let's create a DataFrame using a dictionary of Series

In [30]:
# Dictionary containing the data
# Keys will be the column names and the values will be the Series 
d = {'Model':pd.Series(["Civic", "Camry", "Elantra"]),'Price':pd.Series([699.99, 999.99, 799.99]) }
cars_df = pd.DataFrame(d)
print(cars_df)
print()

# The car company names are now the indices
# Each Series must explicitly have its own index
d = {'Model':pd.Series(["Civic", "Camry", "Elantra"], index=["Honda", "Toyota", "Hyundai"]),
         "Price":pd.Series([699.99, 999.99, 799.99], index=["Honda", "Toyota", "Hyundai"])}

cars_df = pd.DataFrame(d)

print(cars_df)

     Model   Price
0    Civic  699.99
1    Camry  999.99
2  Elantra  799.99

           Model   Price
Honda      Civic  699.99
Toyota     Camry  999.99
Hyundai  Elantra  799.99


In [34]:
# You can also create a DataFrame using a dictionary of NumPy arrays or lists too!
d = {'Model':["Civic", "Camry", "Elantra"], "Price":[699.99, 999.99, 799.99]}

cars_df = pd.DataFrame(d, index = ["Honda", "Toyota", "Hyundai"])

print(cars_df)

           Model   Price
Honda      Civic  699.99
Toyota     Camry  999.99
Hyundai  Elantra  799.99


In [41]:
# You can add rows by using the loc function
cars_df.loc["Ford"] = ["Focus", 899.99]
print(cars_df)

           Model   Price
Honda      Civic  699.99
Toyota     Camry  999.99
Hyundai  Elantra  799.99
Ford       Focus  899.99


In [42]:
# # You can delete rows by using the drop function, which returns a DF without the selected row
# print(cars_df)
# print()
print(cars_df.drop('Ford'))
print(cars_df)

           Model   Price
Honda      Civic  699.99
Toyota     Camry  999.99
Hyundai  Elantra  799.99
           Model   Price
Honda      Civic  699.99
Toyota     Camry  999.99
Hyundai  Elantra  799.99
Ford       Focus  899.99


In [None]:
# Inplace as true will change the DataFrame itself and returns nothing
cars_df.drop('Ford', inplace=True)
print(cars_df)

Let's add some more cars to the dataframe


In [44]:
cars_df.loc["Ford"] = ["Focus", 899.99]
cars_df.loc["Mercedes"] = ["S Class", 1299.99]
cars_df.loc["Infiniti"] = ["Q60", 1099.99]
cars_df.loc["Nissan"] = ["370Z", 1499.99]

# Print first 5
print(cars_df.head(3)) 
print()

# Print last 5
print(cars_df.tail(4))
print()

# Print all
print(cars_df)


           Model   Price
Honda      Civic  699.99
Toyota     Camry  999.99
Hyundai  Elantra  799.99

            Model    Price
Ford        Focus   899.99
Mercedes  S Class  1299.99
Infiniti      Q60  1099.99
Nissan       370Z  1499.99

            Model    Price
Honda       Civic   699.99
Toyota      Camry   999.99
Hyundai   Elantra   799.99
Ford        Focus   899.99
Mercedes  S Class  1299.99
Infiniti      Q60  1099.99
Nissan       370Z  1499.99


Adding, setting, and deleting columns are like operating with dictionaries


In [45]:
cars_df["Quantity"] = [100, 200, 300, 400, 300, 100, 200]
cars_df["Door_Style"] = ["Sedan", "Sedan", "Sedan", "Sedan", "Coupe", "Coupe", "Coupe"]
cars_df["Revenue"] = cars_df["Price"]*cars_df["Quantity"]
print(cars_df)

            Model    Price  Quantity Door_Style   Revenue
Honda       Civic   699.99       100      Sedan   69999.0
Toyota      Camry   999.99       200      Sedan  199998.0
Hyundai   Elantra   799.99       300      Sedan  239997.0
Ford        Focus   899.99       400      Sedan  359996.0
Mercedes  S Class  1299.99       300      Coupe  389997.0
Infiniti      Q60  1099.99       100      Coupe  109999.0
Nissan       370Z  1499.99       200      Coupe  299998.0


In [46]:
# let's drop the 'Model' column 
del cars_df["Model"]
# can also use: cars_df.drop('Model', 1, inplace = True)
print(cars_df)
print()
# Let's reinsert the Model column again back in its original position
cars_df.insert(0, "Model",["Civic", "Camry", "Elantra", "Focus", "S Class", "Q60", "370Z"]) 
print(cars_df)

            Price  Quantity Door_Style   Revenue
Honda      699.99       100      Sedan   69999.0
Toyota     999.99       200      Sedan  199998.0
Hyundai    799.99       300      Sedan  239997.0
Ford       899.99       400      Sedan  359996.0
Mercedes  1299.99       300      Coupe  389997.0
Infiniti  1099.99       100      Coupe  109999.0
Nissan    1499.99       200      Coupe  299998.0

            Model    Price  Quantity Door_Style   Revenue
Honda       Civic   699.99       100      Sedan   69999.0
Toyota      Camry   999.99       200      Sedan  199998.0
Hyundai   Elantra   799.99       300      Sedan  239997.0
Ford        Focus   899.99       400      Sedan  359996.0
Mercedes  S Class  1299.99       300      Coupe  389997.0
Infiniti      Q60  1099.99       100      Coupe  109999.0
Nissan       370Z  1499.99       200      Coupe  299998.0


In [47]:
# Moving a column to a different position

rev = cars_df.pop("Revenue") # returns a column, which is removed from the data frame (Think CTRL-X)
cars_df.insert(3, "Revenue", rev) # CTRL-V
print(cars_df)

            Model    Price  Quantity   Revenue Door_Style
Honda       Civic   699.99       100   69999.0      Sedan
Toyota      Camry   999.99       200  199998.0      Sedan
Hyundai   Elantra   799.99       300  239997.0      Sedan
Ford        Focus   899.99       400  359996.0      Sedan
Mercedes  S Class  1299.99       300  389997.0      Coupe
Infiniti      Q60  1099.99       100  109999.0      Coupe
Nissan       370Z  1499.99       200  299998.0      Coupe


## Indexing a DataFrame
There are multiple ways of indexing a DataFrame. 

To get a column, we just put the column name within square brackets, or use '.' notation

In [48]:
print(cars_df["Model"])
print()
print(cars_df.Model)

# The output is a Series with the left-hand column as the index, and the right hand column as the column values

Honda         Civic
Toyota        Camry
Hyundai     Elantra
Ford          Focus
Mercedes    S Class
Infiniti        Q60
Nissan         370Z
Name: Model, dtype: object

Honda         Civic
Toyota        Camry
Hyundai     Elantra
Ford          Focus
Mercedes    S Class
Infiniti        Q60
Nissan         370Z
Name: Model, dtype: object


To get a row or index, there are two ways. 
* .loc[label] will select the row by its label 
* .iloc[integer_location] will select the row by its integer location of the index


In [50]:
print(cars_df.loc["Toyota"])

print()

# In cars_df, the Toyota is the 2nd index in the df, so it corresponds to "1" in indexing (Indexing starts at 0)
print(cars_df.iloc[1])


Model          Camry
Price         999.99
Quantity         200
Revenue       199998
Door_Style     Sedan
Name: Toyota, dtype: object

Model          Camry
Price         999.99
Quantity         200
Revenue       199998
Door_Style     Sedan
Name: Toyota, dtype: object


We can also print multiple rows by taking an index list argument.

Suppose we want info on just Honda and Toyota…

In [51]:
# Notice that the output is a DataFrame
print(cars_df.loc[ ["Honda", "Toyota"] ])
print()
# We can also do the same with .iloc[], except use slicing
print(cars_df.iloc[0:2])
print()
# A more convenient way of doing this is:
print(cars_df[0:2])

        Model   Price  Quantity   Revenue Door_Style
Honda   Civic  699.99       100   69999.0      Sedan
Toyota  Camry  999.99       200  199998.0      Sedan

        Model   Price  Quantity   Revenue Door_Style
Honda   Civic  699.99       100   69999.0      Sedan
Toyota  Camry  999.99       200  199998.0      Sedan

        Model   Price  Quantity   Revenue Door_Style
Honda   Civic  699.99       100   69999.0      Sedan
Toyota  Camry  999.99       200  199998.0      Sedan


Lastly, we can select rows using boolean indexing

In [52]:
cars_df["High_rev"] = cars_df["Revenue"]>150000
print(cars_df)
print()

# Let's select cars that only earn high revenues
print(cars_df.loc[(cars_df["High_rev"])])
print() 

# Let's select cars that only earn low revenues
print(cars_df.loc[-(cars_df["High_rev"])])

            Model    Price  Quantity   Revenue Door_Style  High_rev
Honda       Civic   699.99       100   69999.0      Sedan     False
Toyota      Camry   999.99       200  199998.0      Sedan      True
Hyundai   Elantra   799.99       300  239997.0      Sedan      True
Ford        Focus   899.99       400  359996.0      Sedan      True
Mercedes  S Class  1299.99       300  389997.0      Coupe      True
Infiniti      Q60  1099.99       100  109999.0      Coupe     False
Nissan       370Z  1499.99       200  299998.0      Coupe      True

            Model    Price  Quantity   Revenue Door_Style  High_rev
Toyota      Camry   999.99       200  199998.0      Sedan      True
Hyundai   Elantra   799.99       300  239997.0      Sedan      True
Ford        Focus   899.99       400  359996.0      Sedan      True
Mercedes  S Class  1299.99       300  389997.0      Coupe      True
Nissan       370Z  1499.99       200  299998.0      Coupe      True

          Model    Price  Quantity   Revenue D

In [53]:
# We can also accomplish the same task as in the previous cell in a single line:
cars_df.loc[cars_df.Revenue > 150000]

Unnamed: 0,Model,Price,Quantity,Revenue,Door_Style,High_rev
Toyota,Camry,999.99,200,199998.0,Sedan,True
Hyundai,Elantra,799.99,300,239997.0,Sedan,True
Ford,Focus,899.99,400,359996.0,Sedan,True
Mercedes,S Class,1299.99,300,389997.0,Coupe,True
Nissan,370Z,1499.99,200,299998.0,Coupe,True


We can select for rows and columns at the same time

In [54]:
print("The number of Toyotas is: " + str(cars_df.loc["Toyota"]["Quantity"]))
print()

# Printing Hyundai's model and price
print(cars_df.loc[["Hyundai"],["Model", "Price"]])
print()

# Printing the first two cars and the first three columns with iloc
print(cars_df.iloc[:2,:3])

The number of Toyotas is: 200

           Model   Price
Hyundai  Elantra  799.99

        Model   Price  Quantity
Honda   Civic  699.99       100
Toyota  Camry  999.99       200


The fastest way to get and set single (non-array) values is using the **at** or **iat** functions
* at function is for label lookups (like loc)
* iat function is for integer-based label lookups (like iloc)


In [55]:
print("The number of Toyotas is: " + str(cars_df.at["Toyota","Quantity"]))
print()

# we can change the value of a specific cell:
cars_df.iat[1, 2] = 300

print("The number of Toyotas is: " + str(cars_df.iat[1,2]))
print()
print(cars_df.head())

The number of Toyotas is: 200

The number of Toyotas is: 300

            Model    Price  Quantity   Revenue Door_Style  High_rev
Honda       Civic   699.99       100   69999.0      Sedan     False
Toyota      Camry   999.99       300  199998.0      Sedan      True
Hyundai   Elantra   799.99       300  239997.0      Sedan      True
Ford        Focus   899.99       400  359996.0      Sedan      True
Mercedes  S Class  1299.99       300  389997.0      Coupe      True


Let's wrap up today by outputing our cars dataframe to a CSV file. 

In [56]:
output = cars_df.to_csv("cars.csv")

## Other Operations

In [None]:
# Importing data from a csv

In [None]:
# Prints the mean of all the numeric columns
print(cars_df.mean())
print()

# Print summary statistics
# Very useful for exploratory data analysis! 
print(cars_df.describe())
?pd.DataFrame.describe()
print()

# You can apply lambda(anonymous) functions to certain columns in your dataframe for quick manipulations
print(cars_df["Model"].apply(lambda x: x.upper()))

In [None]:
# Querying / Filtering
print(cars_df)
print()
# Select car models that are 'coupe' styled and earn high revenue
print(cars_df.query('Door_Style=="Coupe" & High_rev'))
print()

In [None]:
# Grouping summaries
print(cars_df.groupby('Door_Style').mean())
print()

# Just the mean of the revenues for each door_style
print(cars_df.groupby('Door_Style')["Revenue"].mean())
print()

# Multiple grouping options
print(cars_df.groupby(['Door_Style', "High_rev"]).mean())
print()

# Multiple grouping and selecting specific columns
print(cars_df.groupby(['Door_Style', "High_rev"])['Price', 'Quantity'].mean())

In [None]:
# Binning data into categories
percentiles = [0, cars_df["Price"].quantile(0.25), cars_df["Price"].quantile(0.75), cars_df["Price"].max()]
price_type = ["Inexpensive", "Midrange", "Luxury"]

# The cut function is useful for taking continuous variables, like price, and making them categorical!
cars_df["Price_type"] = pd.cut(cars_df['Price'], percentiles, labels=price_type)
print(cars_df)

# But the values in "Price_type" aren't true categories yet...

In [None]:
# Take the price_type values and turn them into categories (faster operations under the hood)
# Notice the order of the categoricals
cars_df["Price_type"] = cars_df["Price_type"].astype("category")
print(cars_df["Price_type"])
print()
print("Is the Series ordered? " + str(cars_df["Price_type"].cat.ordered))

cars_df["Price_type"] = cars_df["Price_type"].cat.as_unordered()
print("Is the Series ordered? " + str(cars_df["Price_type"].cat.ordered))

In [None]:
# We are currently in the wrong order! 
    # Inexpensive < Luxury < Midrange
# Let's change it to the right order:
    # Inexpensive < Midrange < Luxury
cars_df["Price_type"] = cars_df["Price_type"].cat.reorder_categories(["Inexpensive","Midrange","Luxury"], ordered=True)

print(cars_df["Price_type"])
print()
print(cars_df.sort_values(by="Price_type")) # Try changing "Price_type" to "Revenue"! 


In [None]:
print(cars_df["Price_type"].value_counts())
print()
print(cars_df["Price_type"].describe())
print()
print(cars_df.groupby("Price_type")["Revenue"].sum())

# Exercises

Here is a link to a github repo with lots of good Pandas exercises: https://github.com/guipsamora/pandas_exercises

## Sources:
Adapted from NumPy's QuickStart Tutorial
* https://docs.scipy.org/doc/numpy-dev/user/quickstart.html

Adapted from Pandas Documentation
* https://pandas.pydata.org/pandas-docs/stable/10min.html
* https://pandas.pydata.org/pandas-docs/stable/dsintro.html