# Python Tutorials

### Data input

Solvertank Digital Science   
[http://www.solvertank.com](http://www.solvertank.com)   
<img src="cube.gif" align="left" width="50" />

Dataframe is the main format to work with data in Python. This format is defined by Pandas, a free data analylisis library (https://pandas.pydata.org/)

In [None]:
# import pandas
import pandas as pd

## Manual

In [None]:
data = {'days': [12,53,45,11,23], 'age': [12,18,23,47,19], 'sex': ['M','F','M','M','F']}
df = pd.DataFrame(data, columns = ['days', 'age', 'sex'])

In [None]:
df.head()

## CSV from remote file


In [None]:
df = pd.read_csv('https://raw.githubusercontent.com/uiuc-cse/data-fa14/gh-pages/data/iris.csv')
#other examples 
#df = pd.read_csv('data.tsv', sep = '\t') -- tab separeted
#df = pd.read_csv('data.csv', sep = '|')

In [None]:
df.head()

## Excel from local file

In [None]:
from pandas import ExcelFile
df = pd.read_excel('iris.xlsx', sheet_name='Sheet1')

In [None]:
df.head()

## Scikit datasets
Scikit-learn is a free tool for data mining and data analysis  
Several datasets are available to test and train   
https://scikit-learn.org/stable/datasets/

In [None]:
# import iris dataset from Scikit
from sklearn.datasets import load_iris

In [None]:
iris = load_iris()

In [None]:
iris['data']

In [None]:
iris['feature_names']

In [None]:
iris['DESCR']

In [None]:
# convert Scikit dataset to Pandas dataframe format
import numpy as np # numpy is a free library, required for this command
df = pd.DataFrame(np.c_[iris['data'], iris['target']], columns= np.append(iris['feature_names'], ['target']))

In [None]:
df.head()

### Other databases for test and train
UC Irvine: http://archive.ics.uci.edu/ml/index.php   
Kaggle: https://www.kaggle.com/datasets   
AWS: https://registry.opendata.aws/   
Wikipedia: https://en.wikipedia.org/wiki/List_of_datasets_for_machine-learning_research

## MySQL

For this example, you need to inform your MySQL connection parameters   
It is also necessary to install MySQL connector (https://anaconda.org/anaconda/mysql-connector-python)

In [None]:
import mysql.connector

In [None]:
config_host = '<REPLACE BY HOST/SERVER>'
config_user = '<REPLACE BY USERNAME>'
config_password = '<REPLACE BY PASSWORD>'
config_database = '<REPLACE BY DATABASENAME>'

In [None]:
connection = mysql.connector.connect(user=config_user, password=config_password, host=config_host, database=config_database)
df = pd.read_sql('SELECT * FROM TABLE_NAME', con=connection)
connection.close()

In [None]:
df.head()

## Snowflake

For this example, you need to inform your Snowflake connection parameters   
It is also necessary to install Snowflake connector (https://anaconda.org/conda-forge/snowflake-connector-python)

In [None]:
import snowflake.connector

In [None]:
connection = snowflake.connector.connect(
  user='<REPLACE BY USERNAME>',
  password='<REPLACE BY PASSWORD>',
  account='<REPLACE BY ACCOUNT>'
)

In [None]:
cursor = connection.cursor()
cursor.execute('USE WAREHOUSE <REPLACE BY WAREHOUSE NAME>')
cursor.execute('USE DATABASE <REPLACE BY DATABASE NAME>')
cursor.execute('USE SCHEMA <REPLACE BY SCHEMA NAME>')

In [None]:
data = cursor.execute('SELECT * FROM TABLE_NAME')

In [None]:
# convert Snowflake dataset to Pandas dataframe format
df = pd.DataFrame.from_records(iter(data), columns=[x[0] for x in data.description]) 

In [None]:
df.head()

In [None]:
cursor.close()
connection.close()