# How to make data and influence people

This short course is intended to get you up and running on how to get any data you want.
Our focus here is on downloading and scraping files from EDGAR, but it extends to any dataset on the web (or off).

**Overview**

  1. [Installing your Environment](#Installing-your-Environment)
     1. [Downloading Anaconda](#Downloading)
     1. [Installing Anaconda](#Installing)
     1. [Installing Git](#Installing-Git)
  1. [Testing things](#Testing-things)
  1. [Setting up your Environment](#Setting-up-your-Environment)
  1. [Extra Credit](#Extra-Credit)

## Installing your Environment
These instructions will set you up with a computer that is ready to download all the data!

### Jargon

Before we start installing, let's introduce some jargon.

**terminal**: A text-entry method of controlling your computer. We use this to run python programs, because a .py file isn't like an executable (.exe) you can just double-click on and run. Well it can be, but that's for another day.
   - If you use a Windows computer, I suggest you use [PowerShell](https://github.com/PowerShell/PowerShell) as your terminal.
   - If you use macOS, I suggest you use [iTerm2](https://www.iterm2.com/) as your terminal.
   - If you use Linux, you don't need my advice.
   
**Python**: The best language to program in, in my naive opinion. It is not a compiled language, meaning you don't need to know anything fancy to run it, other than typing `python program.py` into your *terminal*.

**Anaconda**: a huge `.exe` file you can download from [here](https://www.anaconda.com/distribution/), which contains *Python*, but also every library that you might ever find useful for data-science work.

**conda**: a *Python* program you run from your *terminal*, which installs packages for you.

**pip**: Same as conda, but harder to use and covering a broader set of packages.

**git**: A way of saving off versions of your code over time. Integrated into [github](https://github.com), which allows for public sharing of really anything, but usually code (WARNING: also an instant trigger word for Brian Cadman).

### Downloading

We will be installing [Anaconda](https://www.anaconda.com/distribution/) Python, and then some specific libraries to help us get EDGAR documents a bit more easily.

#### Downloading on Windows

  1. Download the Anaconda executable from here: [www.anaconda.com/distribution/](https://www.anaconda.com/distribution/#download-section)
     1. You want the Windows 64 bit version
     1. You also want the Python 3.7 version.
     1. It should look like this:

  <img src="img/anaconda_windows.png" alt="Download Anaconda: Windows 64bit Python 3.7" style="width: 80%;"/> 

#### Downloading on macOS

  1. Download the Anaconda executable from here: [www.anaconda.com/distribution/](https://www.anaconda.com/distribution/#download-section)
     1. You want the macOS 64 bit version
     1. You also want the Python 3.7 version.
     1. It should look like this (click the red boxes):

  <img src="img/anaconda_osx.png" alt="Download Anaconda: macOS 64bit Python 3.7" style="width: 80%;"/>

#### Downloading on Linux

  1. Download the Anaconda executable from here: [www.anaconda.com/distribution/](https://www.anaconda.com/distribution/#download-section)
     1. You want the Linux 64 bit version
     1. You also want the Python 3.7 version.
     1. It should look like this (click the red boxes):

  <img src="img/anaconda_linux.png" alt="Download Anaconda: Linux 64bit Python 3.7" style="width: 80%;"/>

### Installing

I have images for installing on Windows, let's hope the other three platforms are similar. 
My general approach is that I accept the defaults, so that should work on the other platforms. 
I do have a note about Linux below.

  1. Run the installer.
  <img src="img/installation_1.png" alt="Installer Welcome" style="width: 500px;"/> 
  
  2. Accept the EULA if you so choose
  <img src="img/installation_2.png" alt="Installer EULA" style="width: 500px;"/>
  
  3. Choose who to install it for. I suggest just your User.
  <img src="img/installation_3.png" alt="Installer Scope" style="width: 500px;"/>
  
  4. Choose installation location. I like `$HOME/Anaconda`, but the default `Anaconda3` is great too.
  <img src="img/installation_4.png" alt="Installer Location" style="width: 500px;"/>
  
  5. Choose whether to install environmental variables. I suggest adding to path, but if you are wary, feel free not to!
  <img src="img/installation_5.png" alt="Installer Env Vars" style="width: 500px;"/>
     NOTE: you will only get this pop-up if you already have Anaconda installed and checked the second box. IF you don't see this, don't panic, it's a good thing.
  <img src="img/installation_5p.png" alt="Installer Error Box" style="width: 300px;"/>
  
  6. The install will proceede, great job! You're now done.
  <img src="img/installation_6.png" alt="Installing" style="width: 500px;"/>

**Note on Linux installation**: The linux installation downloads a .sh file, so you must make it executable and run it from your shell. For example:

```bash
cd
mkdir ~/sources/
wget https://repo.anaconda.com/archive/Anaconda3-2019.07-Linux-x86_64.sh
chmod u+x Anaconda3-2019.07-Linux-x86_64.sh
./Anaconda3-2019.07-Linux-x86_64.sh
```


then follow the defaults, which should be similar to the windows ones above.

### Installing Git

#### on Windows

Download [git](https://gitforwindows.org/) and install it, accepting defaults.

#### on macOS

If you have xcode installed already, you already have git.
If not, download and install a graphical installer from [here](https://sourceforge.net/projects/git-osx-installer/files/)

#### on Linux

```shell
sudo apt-get update
sudo apt-get upgrade
sudo apt-get install git
```


## Testing things

You should have both Python and Git installed, so let's test it!

Step 1) Open your terminal. Instructions: 
     [Windows](https://docs.microsoft.com/en-us/powershell/scripting/getting-started/starting-windows-powershell?view=powershell-6),
     [macOS (search iTerm)](https://support.apple.com/en-us/HT204014),
     [Linux](https://help.ubuntu.com/community/UsingTheTerminal#Starting_a_terminal)

Step 2) Type the following (or copy and paste):

```bash
python --version
conda --version
pip --version
git --version
jupyter --version
```

On windows, I get:

```bash
$ python --version
Python 3.7.2

$ conda --version
conda 4.7.11

$ pip --version
pip 19.0.1 from C:\Users\gaulinmp\Anaconda\lib\site-packages\pip (python 3.7)

$ git --version
git version 2.20.1.windows.1

$ jupyter --version
jupyter core     : 4.5.0
jupyter-notebook : 6.0.0
ipython          : 7.7.0
```

On linux, I get:

```bash
$ python --version
Python 3.7.3

$ conda --version
conda 4.7.11

$ pip --version
pip 19.1.1 from /home/gaulinmp/anaconda/lib/python3.7/site-packages/pip (python 3.7)

$ git --version
git version 2.17.1

$ jupyter --version
jupyter core     : 4.5.0
jupyter-notebook : 6.0.0
ipython          : 7.7.0
```

If none of these three throw errors, everything is awesome! You're done!

Note: You may see more things beneath `jupyter --version`. Don't worry if you do, I did too. I just deleted them, because really all we care about is the core jupyter, notebook, and ipython.

## Setting up your Environment

So you have everything installed, it's time to add a few more conveniences that will make our lives easier.

### VSCode

At some point, everyone needs to write a text file or a python file.
I believe the best editor out there right now to do this in is [VS Code](https://code.visualstudio.com/).
I suggest you install that, then install a few extensions (Open VSCode and click on the red-boxed icon below, then search where it says *Search Extensions*):

<img src="img/vscode_extensions.png" alt="VSCode Extensions" style="width: 500px;"/>

  1. **Magic Python**: highlights the Python syntax. It's very pretty.
  1. **Python**: Adds lots of features for programming in Python. It's mandatory.
  1. **SAS Language**: Ads syntax highlighting for writing SAS code
  1. **Stata Enhanced**: Ads syntax highlighting for writing Stata code
  1. **Visual Studio IntelliCode**: Offers code-completion. It's amazing, just hit tab.
  
*Note*: I use VSCode for SAS and Stata, and in VSCode, I select lines I want to run, hit Ctrl-Alt-N, and those lines of code are run in SAS or Stata, just like if I were using their built in editors. If you want to set this up, drop by my office some time.

### Extra Python Libraries

When we scrape EDGAR documents, it requires downloading them, parsing them, and extracting out data.
That first step can be kind of indimidating.

To solve this, I've written a library on Github, which facilitates the downloading and parsing of filings. So let's get that installed.

Step 1) Open terminal.

Step 2) Make a directory where the code might live. I like ~/sources/:

```bash
cd ~
mkdir sources
cd sources
git clone https://github.com/gaulinmp/pyedgar.git
cd pyedgar
pip install -e ./
```

Thats it! Don't forget the -e, so that you can `git pull` new changes as they come out. This doesn't have to make sense now, it may some day.

# Extra Credit

You may have noticed that these instructions are in an odd format.
That's because it's actually a Jupyter Notebook.
This is python 'code' you're reading!

So for extra credit, let's open it up and take a look.

In terminal (if ~/sources exists, otherwise wherever you like to put it):

```bash
cd ~/sources/
git clone https://github.com/gaulinmp/edgar_shortcourse.git
cd edgar_shortcourse
jupyter notebook ./notebooks/
```

By default, that should start a jupyter notebook kernel (the program that runs notebooks), and launch a browser window. If not, go to [http://localhost:8888](http://localhost:8888).

You should see `1_Installation.ipynb`, which you can click on.
You will now be in the notebook environment.

Scroll all the way down the bottom here (or click the Extra Credit link up top), and run the next cell.

You run cells by clicking in the cell, and either pressing the Run button up top, or holding Shift and pressing Enter (Ctrl-Enter and Alt-Enter also work!).

In [1]:
print("Hello World!")

Hello World!


In [2]:
print("I am an amazing Python hacker already. 1337.")

I am an amazing Python hacker already. 1337.


In [3]:
for i in range(10):
    print(i, end=', ')

0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 

In [4]:
list_comprehension_is_amazing = [f'string {i}' for i in range(10, 30, 10)]
print(list_comprehension_is_amazing)

['string 10', 'string 20']


In [None]:
with open('1_Installation.ipynb', 'r') as file_pointer:
    txt = file_pointer.read()

with_open = txt.find('"with open')
print_close = txt.find('print_close])')
print(txt[with_open:print_close+63])

In [None]:
from IPython.display import IFrame
IFrame("https://www.youtube.com/embed/5drjr9PmTMA?rel=0&amp;showinfo=0",
       width=560, height=415)