### Importing Pandas

#### Packages
Packages provide additional tools and functions not present in base Python. Python includes a number of packages to start with, the Anaconda distribution which we've all downloaded for Unit 3 comes with the "Pandas" package already installed.

Once you've installed a package, you can load it into your current Python session with the import function. Otherwise these functions will not be available.


#### Pandas

Like spreadsheets in Microsoft Excel, Pandas allows us to store our data in tabular, multi-dimensional objects (dataframes) with familiar features like rows, columns, and headers. This is useful because it makes management, manipulation, and cleaning of large datasets much easier than would be the case using Python's built-in data structures such as lists. Pandas also provides a wide range of useful tools for working with data once it has been stored and structured.

Begin by importing the pandas package using the following command:

In [2]:
import numpy as np
import pandas as pd

### Creating a DataFrame

#### Working Directories & Relative Paths

By now, you should have either downloaded the csv file "Apple.csv" from canvas, or saved your own data as a csv file. I've stored my copy in the same folder as this Jupyter Notebook. **NOTE:** make sure that your csv file is saved in the same working directory as your .ipynb notebook file that you will use. 

Remember that Jupyter Notebooks automatically set your working directory to the folder where the .ipynb is saved. You'll have to save the document at least once to set your directory, but once there you can use what's called relative file paths to access the files there.

If a file is located in your working directory, its relative path is just the name of the file!

#### Using the `pd.read_csv()` function

`pd.read_csv` reads the tabular data from a Comma Separated Values (csv) file into a dataframe object that we'll define as `df`.

To create our dataframe object we'll define our object `df` by executing the `pd.read_csv()`function on our data file by inserting the relative file path into the parathenses.

In [3]:
df=pd.read_csv("Apple.csv")

### Finding and Filtering Data
You are going to explore the Apple Stock data set and go through logical steps to arrive at the number of years that the Apple share price was over 70$. But firstly, you should try and get an idea of what the data is about so you can understand the content and structure of this data set. The first step you are going to do is take samples from the chosen data set.You will type "df.sample(n=x)" and that would give you an x amount of random samples of the data frame. The x represents the number of samples and in this example, try  the number 40 for x.

In [107]:
df.sample(n=40)

Unnamed: 0,Date,Open,High,Low,Close,Adj Close,Volume
8140,2013-03-22,16.235001,16.503571,16.182501,16.496786,14.465927,395105200.0
4016,1996-10-30,0.209821,0.214286,0.204241,0.204241,0.176493,257051200.0
1127,1985-05-29,0.076451,0.077009,0.076451,0.076451,0.060316,246556800.0
6817,2007-12-18,6.661428,6.690357,6.378572,6.535,5.647151,1222603000.0
566,1983-03-10,0.194754,0.196987,0.19029,0.191964,0.151451,112604800.0
1178,1985-08-09,0.06808,0.06808,0.06808,0.06808,0.053712,60950400.0
6031,2004-11-03,0.970893,1.001964,0.964107,0.987679,0.853492,1204174000.0
8575,2014-12-11,28.065001,28.450001,27.834999,27.905001,25.463926,165606800.0
3618,1995-04-05,0.304688,0.310268,0.301339,0.310268,0.265819,264857600.0
2900,1992-06-03,0.504464,0.504464,0.482143,0.483259,0.400285,300574400.0


Now you started seeing some numbers such as Apple's open stock price and the volume. Something which is very important for stock traders is the price it started trading at and the price it is at now. You will start by filtering out the first 10 columns of the stock price and you will type "print(df.head(3))" . The word "head" in your code basically means the beginning of your data frame. And you probably already guessed it, if "head" shows you the beginning then "tail" shows you the end. We will start by "print(df.head(3))" then try "print(df.tail(3))" .

In [119]:
print(df.head(3))

         Date      Open      High       Low     Close  Adj Close       Volume
0  1980-12-12  0.128348  0.128906  0.128348  0.128348   0.101261  469033600.0
1  1980-12-15  0.122210  0.122210  0.121652  0.121652   0.095978  175884800.0
2  1980-12-16  0.113281  0.113281  0.112723  0.112723   0.088934  105728000.0


In [109]:
print(df.tail(3))

             Date        Open        High         Low       Close   Adj Close  \
10013  2020-08-28  126.012497  126.442497  124.577499  124.807503  124.807503   
10014  2020-08-31  127.580002  131.000000  126.000000  129.039993  129.039993   
10015  2020-09-01  132.759995  134.800003  130.529999  134.179993  134.179993   

            Volume  
10013  187630000.0  
10014  225702700.0  
10015  151948100.0  


We can notice that over 40 years the stock increased from a high of 0.1289 in 1980 and reached a high of 134.80 in 2020. That is an increase of 1,044%. 

Apple stock is one of the most important stocks in the US market as Apple is the biggest company is the world and there is a huge correlation between the market and the stock. So now that we found out Apple is at 134$ in 2020, how long was Apple able to sustaina price over 12 Was it a one day rise? One month? Or was it a year?

You will figure that out by inputting "print(df["High"] >= 70)". This code is telling pyhton to look through the columns under "High" and see if the price was greater than or equal to 70.

In [115]:
print(df["High"] >= 110)

0        False
1        False
2        False
3        False
4        False
         ...  
10011     True
10012     True
10013     True
10014     True
10015     True
Name: High, Length: 10016, dtype: bool



### Weird! Notice anything strange?
### Why did you get it in this format and not just get rows where the "High is greater than 70?
This is because this code is running a function that compares every single value in "High" to the number of "70" or greater and then assigning a true or false. To get what you want, you will have to to wrap the whole function "(df["High"] >= 70)" and add a df[ before it. When doing that you are referencing the High to your data frame and you end up with function print(df[(df["High"] >= 120)])
    

In [4]:
print(df[(df["High"] >= 70)])

             Date        Open        High         Low       Close   Adj Close  \
9836   2019-12-16   69.250000   70.197502   69.245003   69.964996   69.485619   
9837   2019-12-17   69.892502   70.442497   69.699997   70.102501   69.622192   
9838   2019-12-18   69.949997   70.474998   69.779999   69.934998   69.455833   
9839   2019-12-19   69.875000   70.294998   69.737503   70.004997   69.525352   
9840   2019-12-20   70.557503   70.662498   69.639999   69.860001   69.381348   
...           ...         ...         ...         ...         ...         ...   
10011  2020-08-26  126.180000  126.992500  125.082497  126.522499  126.522499   
10012  2020-08-27  127.142502  127.485001  123.832497  125.010002  125.010002   
10013  2020-08-28  126.012497  126.442497  124.577499  124.807503  124.807503   
10014  2020-08-31  127.580002  131.000000  126.000000  129.039993  129.039993   
10015  2020-09-01  132.759995  134.800003  130.529999  134.179993  134.179993   

            Volume  
9836  

In [7]:
Price_Filter=df[(df["High"] >= 70)]

### Results!
We are now finally able to see that Apple stock has been equal or greater than 70 for only one year which is in 2020. 

This indicates that Apple even though consider a safe stock is very volatile.

Since Apple and the stock market are strongly correlated, you can assume that the stock market behaves in a similar matter!

But why assume? Since you went through this model, go and find a date base of the US Stock market and filter the data just like we did now! 

In [12]:
Price_Filter.to_csv("Price_Filter.csv", index=False)