# Python for R Users
- Author: Sylvia Tran
- Date: September 9, 2019

[GitHub Reference](https://github.com/godsylla/python-for-R-users)

![alt text][python-and-r]

[python-and-r]: https://github.com/godsylla/python-for-R-users/blob/sylvia/assets/python-are-r-friends.png "Python"

### Introduction

This notebook is a tutorial on Python for R Users. The intention is to help draw some similarities between Python and R in the hope that those new to Python will find it less initmidating, and overall more approachable.

This is an interactive Python notebook (.ipynb) in the repository that is available to run in Jupyter Notebook (using Anaconda for e.g.) that uses a Python3 kernel. Additionally, there is an attached python script (`python-for-r-users.py`) that contains the code without any of the markdown cells.

The content covered will leverage the following packages:

- [pandas](https://pandas.pydata.org/pandas-docs/stable/) (for data frame shaping, cleaning)
- [numpy](https://docs.scipy.org/doc/numpy/reference/index.html) (for any numeric python required)
- [scikit learn](https://scikit-learn.org/stable/user_guide.html) (for train-test-splitting, feature-scaling, modeling, model metrics)

Should you wish to explore these packages more, please refer to the documentation online. There are ample examples for each as these three are widely used in the field of data science and data analysis. Notably, when using Python, numpy, pandas, or scikit learn for data analysis or data science, please take care to read documentation carefully as there are at times nuanced differences with regard to what is happening in the source code that might result in different than expected behavior when implementing same/similar tasks in R.

### Table of Contents:
  1. Importing Packages
  2. Loading Toy Datasets (sklearn)
  3. Cursory Inspection (pandas & numpy)
  4. Light Cleaning (base python, pandas)
  5. Train-test-split (sklearn)
  6. Feature Scaling (sklearn)
  7. Model (sklearn)
  8. Model Evaluation (sklearn)

#### 1. Importing Packages
![image.png](attachment:image.png)

- R: `library('package_name')`
- Python: `import package_name`

If the module does not exist, you can use 
* `pip install package_name`


If you wish to install it only to your ipynb environment, in a code cell: 
* `!pip install package_name`

In [7]:
# The `np` and `pd` nicknames are convention
import numpy as np
import pandas as pd

#### 2. Loading Toy Datasets

* R: typically found in the `datasets` package
* Python: we'll be using toy datasets from `sklearn.datasets` package. 

Since it's a package we'll be pulling our dataset from, we'll be implementing what we just learned (importing packages).

**BORING.... I know!**

But hey! It's an INTRO, so you all have to suffer this with me.


![image.png](attachment:image.png)

In [9]:
from sklearn.datasets import load_boston

boston = load_boston()
print('boston: ', type(boston))            # R: print(class(boston))

# To access the data for a sklearn toy dataset, we need to add `.data` behind the loaded data
# Simultaneously convert this to a pandas DataFrame
boston_df = pd.DataFrame(boston.data)
print('boston_df: ', type(boston_df))

boston:  <class 'sklearn.utils.Bunch'>
boston_df:  <class 'pandas.core.frame.DataFrame'>



#### Additional Resources can be found here:

- [Numpy Tutorial](http://cs231n.github.io/python-numpy-tutorial/)
- [Scikit Learn - More Documentation](https://scikit-learn.org/stable/index.html)