# Data Insights with Python for Beginners: Next Steps

<i>By Jennifer Walker

Please email me --- jenfly (at) gmail (dot) com --- with any questions, comments, or suggestions. I'd love to hear from you!
</i>

## Learning more Python basics

Here are a few resources you can check out, which are either free or have some free content you can sample. Several of them are oriented specifically towards data analysis.

* https://www.codecademy.com/learn/learn-python
* http://thepythonguru.com/
* https://www.datacamp.com/courses/intro-to-python-for-data-science
* https://www.dataquest.io/course/python-programming-beginner

## Python for data analysis

<i>The suggestions below are specifically geared towards folks who are already doing a lot of data analysis in Excel or Google Sheets, and are keen to switch over to Python for these tasks. If you're just learning Python for fun, or as an introduction to coding in general, the approach outlined here is probably not the best for you --- instead, you might want to keep playing around with the PyCharm setup that we used in the workshop, and exploring other resources online to learn more Python basics. And if your ultimate goal with Python is something other type of work, such as web development, the approach I describe here also might not be the best for you, and you'll want to explore some other options. If you're an avid data cruncher, then read on!</i>

I always say Python is like a <b>"choose your own adventure"</b> because there are many different ways of installing it, setting up your environment for coding (e.g. PyCharm vs. Jupyter notebooks), and when you're writing code there are usually many different ways to accomplish a task. This makes it very flexible and easy to do things exactly the way you want, but it can also be overwhelming when you're just starting out and don't know where to go next!

Here I'm going to recommend some options based on what has worked well for me for data analysis, but this certainly isn't the only way to do things. You may find as you explore and experiment with different options, that something else works better for you, so don't feel obligated to do everything the way I've suggested here. 

### Setting up your computer

As we saw in the workshop, Python comes with many handy built-in functions, like `print()`, `type()`, and `int()`. It also comes with some handy <b>built-in libraries</b> that we can use --- <i>in your workshop slides, see the ones titled "Using a Library"</i>. A library (also called a "package") is a collection of Python code that is grouped together and given a name (e.g. `csv`) and after we import a library (e.g. `import csv`) we can then use all the pieces of code that are inside of it, like functions (e.g. `csv.DictReader()`).

But if you're doing some serious data crunching, you're going to need some additional libraries that don't automatically come with the Python that was already on your computer (if you're on a Mac) or which you downloaded from https://www.python.org/ (if you're on Windows). These are called <b>3rd party libraries</b>, and they're free and available for anyone to use. There are many libraries that are useful for data analysis; three of the main ones are `numpy`, `pandas`, and `matplotlib`.

<img src="numpy.jpeg" width="200px"> 
<img src="pandas_logo.png" width="300px"> 
<img src="matplotlib.png" width="250px">

To use 3rd party libraries, you need to install them in a way such that Python can find them. This can be a bit tricky, and there are a few different ways of doing it. Here I will be showing one approach, which I think is the easiest one for folks who want to crunch their data in Python.

#### Anaconda

For data analysis in Python, I recommend installing <b>Anaconda</b>. This program comes with everything you need to get started with data analysis, so that you can dive in and start learning `numpy`, `pandas`, `matplotlib`, etc., without having to worry about the details of how to find, install, and manage all these libraries.  Anaconda is a "Python distribution", which means it includes the <b>standard Python</b>, like we used in the workshop, <b>plus some extra stuff</b>: 
* all the most common 3rd party libraries for data analysis are pre-installed so you don't need to install them yourself, and 
* it includes a package manager that you can use if you need to install any additional libraries.

<img src="anaconda.png" width="200px">

You can download Anaconda here: https://www.anaconda.com/download/. You'll need to select either Python 3.6 or Python 2.7 --- here's the first choice in the choose your own adventure! Either one is fine. I recommend Python 3.6 unless you already know that you need Python 2 for some reason. You can always install another Python version later if you need it, and multiple versions can co-exist peacefully without interfering with each other.
* Installation instructions for Windows: https://docs.anaconda.com/anaconda/install/windows
* Installation instructions for Mac: https://docs.anaconda.com/anaconda/install/mac-os

You'll need about <b>3 GB of free disk space</b> to install Anaconda. If disk space is an issue, there is another option here: https://conda.io/miniconda.html. This is a bare bones installation requiring very little disk space, but it is a bit more complicated because you would then need to install whatever libraries you need yourself.

#### Using Anaconda with PyCharm

Since we used PyCharm as our development environment in the workshop, let's see how to set up PyCharm to use our newly installed Anaconda version of Python, instead of the default Python that it was using before. To do this, you need to change the project interpreter.

Create another Jupyter notebook `pycharm-anaconda.ipynb` and link to it here. Other notebook contains: screenshots, e.g. try to import pandas without changing interpreter, then after changing it. For changes to take effect, you'll need to exit PyCharm and restart it.

### Analyzing data

Numpy, Pandas, Matplotlib

Jupyter notebook

Anaconda Navigator

### Additional resources

https://www.amazon.ca/Python-Data-Analysis-Wrangling-IPython/dp/1491957662/

https://github.com/wesm/pydata-book