#   Python Training Day 3

# Day 2 Review / Q&A

# <font color = '#526520'> Other Helpful packages for DataScience Workflows </font>

Each one of these packages is a tool. Some work well together (e.g. NumPy and SciPy) others can be used a separate building blocks to create complex work flows. 


* [NumPy](http://www.numpy.org/)
* [SciPy](https://www.scipy.org/)
* [nltk](http://www.nltk.org/)
  - natural language processing toolkit
* [Requests](http://docs.python-requests.org/en/master/)
  - allows you to interact with websites (e.g. forms, passwords, etc) via python
* [glob](https://docs.python.org/3/library/glob.html)
  - glob will search a directory and return a list of a files that meet certain criteria
  - enables you the search for files using wild-card notation to find files (e.g. *.csv)
* [os](https://docs.python.org/3/library/os.html)
  - os allows you access to miscellaneous operating system functions. 
    - used to change working directory in Python
    - good for navigating directories / finding files
    - e.g. get current working directory
* [xlsx writer](http://xlsxwriter.readthedocs.io/)
  - package that allows you to access almost all excel functionality via python.
  - allows you to read data from excel and write data to excel
  - allows you create custom graphs in excel 
  - allows you automate workbook formatting
* [Tkinter](https://docs.python.org/3/library/tk.html)
  - gui builder
  - good for quickly adding in user functionality to scripts like message dialog and file dialog boxes
* [ODBC](https://github.com/mkleehammer/pyodbc/wiki)
  - generic database connection API


# <font color = '#526520'> What Else Can Python Do? </font>

This training has introduced Python largely in the context of data analysis, which might leave you wondering, "I thought Python was general-purpose programming language?! What else can I do with it?" 

That is a great question to which the answer is "**LOTS and LOTS OF COOL STUFF!**". Let's take a look...

* [PyPI: the Python Package Index](https://pypi.python.org/pypi) - this is an repository of over 117K packages covering a variety of topics, domains, and applications.  This list is administered by PyPI admins and apart of the official Python documentation/website.

* [Curated List of Awesome Python Resources](https://github.com/vinta/awesome-python) - although not an *offical* Python resource, this is a great list of popular Python packages categorized by task.

When you find a package that you want to work with, you're next step is to install the package into your Python environment.

# <font color = '#526520'> Managing Python Packages </font>

## What Is There to Manage?

The Anaconda Python Distribution makes life a little easier when getting started with Python as it comes preconfigured with many of the most popular Python packages for data analysis. However, you will come to a point where a package you'd like to use is not included and so you will have to download and install it. Furthermore the packages you use will continue to develop and include new features over time, which will require you to update your packages. All of this, and more, is broadly defined as **package management** and there are tools to make it a simple task.

## Conda Install

The Anaconda Python Distribution has its own package manager called **conda**. The complete documentation available within the [Conda User Guide: Managing Packages](https://conda.io/docs/user-guide/tasks/manage-pkgs.html), and we will review some of the key commands below.

*It is important to note that you will be running these commands within the **terminal or Anaconda Prompt***

Also there is a concept of 'environments' within Python, which is a way to have different versions of Python and associated packages all on the same machine. We will skip this for now, but it is mentioned because if you maintain different environments then you will want to specify which environment you are managing. If this is something you want to explore then the [Conda User Guide: Managing Environments](https://conda.io/docs/user-guide/tasks/manage-environments.html) is a good place to start.

### Search if a package is available for installation
`conda search [package name]`

### Return a list of installed packages
`conda list`

### Install a package
`conda install [package name]`

### Update a package
`conda update [package name]`

### Removing a package
`conda remove [package name]`

## Pip Install

Pip is another recommended tool for Python package management which exists independent of Anaconda (although it is included in the distribution and they work in conjunction). Typically you will use the pip install for packages that you cannot download with conda. 

Again we will cover the basic commands, but it is worth looking through the [Pip documentation](https://pip.pypa.io/en/stable/), especially the *Quickstart* section.

### Search if a package is available for installation
`pip search [package name]`

### Return a list of installed packages
`pip list`

* If you're using the Anaconda distribution and download a package via pip, you will want to verify that that package was installed and can be referenced by Anaconda by using this command and validating that the package name appears.

### Install a package
`pip install [package name]`

### Update a package
`pip install --update [package name]`

### Removing a package
`pip uninstall [package name]`

# <font color = '#526520'> Importing Packages into Python Session </font>

Once you have installed the packages you need, all there is left to do is import them into your Python session using the **`import`** keyword.

You can also use **`import [...] as [...]`** convention to alias a package name, thereby making it easier to reference.

In [2]:
#How to import a package
import pandas

#Example of package aliasing
import numpy as np

#Example of calling a package (remember explicit not implicit)
#create a Panda's DataFrame
df = pandas.DataFrame(['test','this'])
print(df,'\n')

#call numpy 
absolute_value = np.abs(-100)
print('absolute value of -100 =', absolute_value)

      0
0  test
1  this 

absolute value of -100 = 100


# <font color = '#526520'>APIs</font>

## What is an API

API stands for "Application Programming Interface".

Just like a module gives you access to functions, objects, and methods that someone else has built an API will give you access to functions, objects, and methods that are specific to a certain program.

An example of an API is an ATM. 

A lot of the ATM functionality (e.g. set balance number) cannot be accessed by the public. The only way to interact with an ATM is through a set of pre-defined functions that allow you to perform certain allowed actions (e.g. 'deposit money', 'withdrawal money').

The same goes for programs with APIs they allow you to access certain functions or information via Python without giving you full access to all of the "under the hood" functionality.

### Why should I care about APIs?

When you combine Python with an API you are able to take advantage of the things that make Python so great (e.g. clean code, simple easy to understand syntax, ability to build scripts quickly) while accessing advanced functionality. 

Below are some examples of some APIs that have been explored at  . This is by no means an exhaustive list of APIs.

For example, Google has huge API library that will allow to access a lot of Google apps via Python.

APIs can be thought of sandboxes where the biggest limitation is your own creativity. 


## Geopy + Google Maps (or other Geocoding APIs)
* Can be used to convert addresses to longitude / latitude values 
* Can also be used to gain access to other attributes about locations (e.g. if looking at city can see what state / country it belongs to).


## Tableau 
* TabPy -- allows you to run python calculations (including module methods) on tableau data and return results
* Can build out fields / formulas / change data sources
* Save out Tableau data extracts from Python

## cx_Oracle / teradata
* connect to and run queries in oracle / teradata
* can output query results into dataframes or other python data structures
* can input data from flat files or dataframes into oracle / teradata tables
* can set queries to run automatically at certain times or after certain conditions have been met

## Google Analytics
* Access Google Analytics data without relying on GUI
* Build out custom reports
* Automate data downloads

## JIRA 
* Automate ticket updates
* Build out custom reports for current tickets


# <font color = '#526520'> The _"Pythonic"_ Way </font>


## Less coding more using
Python favors less coding and more using. This has resulted in a proliferation of useful modules (aka libraries) that make all kinds of complex functionality possible. 

If you have a project that requires a complex algorithim -- check the internet there is probably a Python library that can help. 

Library usage requires some discretion, but in most cases can make it incredibly easy to piece together complex workflows with minimal effort. 


## One-liners
Python coders like to play "code golf" when they program -- meaning that they often will try to develop programs using as few lines of code as possible. 

There are many 'Pythonic' short cuts that you may want to explore as your skills progress.

Having an understanding of these will not only make your code cleaner, easier to understand, and more efficient but will also help you understand why certain approaches are preferred on popular help sites like Stack Overflow.

* [List Comprehensions](http://treyhunner.com/2015/12/python-list-comprehensions-now-in-color/)
* [Lambda Functions](https://www.programiz.com/python-programming/anonymous-function)
* [Method Chaining](https://stackoverflow.com/questions/41817578/basic-method-chaining)
* [Zip Function](https://docs.python.org/3/library/functions.html#zip) 


## HELP! How to find Python help when you are stuck

The good news: Python has a strong open-source community that has created a large amount of tutorials / documentation to help with almost any problem in Python.

The bad news: not all sources are created equal -- below is a suggested "hierarchy of trustworthiness" for when you are evaluating sources.

1. Python documentation / Package documentation (find using Google)
2. Stack Overflow (suggest searching for problem in Google and selecting top Stack Overflow link).
3. Tutorial websites (e.g.)

For more on the philosophy behind writing the best Python code possible check out [Zen of Python](https://www.python.org/dev/peps/pep-0020/)

# <font color = '#526520'> Closing Remarks <font>

<img src= 'https://i.imgur.com/CfSwf.jpg'/>


# <font color = '#526520'> Live Coding Exercise </font>

This training is by no means an exhaustive exploration of Python, but rather a capable introduction meant to peak your interests and begin your journey into Python programming. You might not be an expert *Pythonista* (yet!), don't fool yourself: you now know enough about Python to start writing scripts and developing programs.

Don't believe it?! Let's prove it by working together through a coding exercise in which you will touch upon the following:

* Working with the *working directory*
* Creating and working with Pandas DataFrames 
* A little bit of Numpy operations
* Defining your own function!
* Data I/O

Open the *Live_Code.py* file with the Spyder IDE. Using the commented instructions as a guide, we will practice writing a program to read-in data, operate on DataFrames, perform a quintile analysis, and write the results back to disk.

*Don't overthink this exercise, the task can be completed in less than 30 lines of actual code*

*This will be a great opportunity to really **warm-up to the documentation** and get used to finding answers to your questions online*


# <font color = '#526520'> Example Time </font>