Basic programming with Python 
---
---
**Outline**

1. Introduction
2. Creating functions 
3. Reading and writing text/binary files
4. Working with data tables
  *   Basic statistics  
  *   Plots with matplotlib 


---

Dataset: https://archive.ics.uci.edu/ml/datasets/wine 


**1. Introduction** 
---
Check which packages you have already installed
---
1.1. Display a list of installed package names and version numbers

In [None]:
pip list

1.2. Check specific installed packages

a. numpy (Numerical python) https://numpy.org/doc/stable/user/absolute_beginners.html

b. pandas https://pandas.pydata.org/docs/user_guide/index.html#user-guide

c. matplotlib  https://matplotlib.org/stable/index.html

In [None]:
pip show numpy

In [None]:
pip show pandas

In [None]:
pip show matplotlib



---
Use packages
---

Access to a package or to the code in another module by importing it. 

1.3. Import and use packages

In [None]:
import numpy as np
a = np.arange(6)
print(a)

In [None]:
a1D = np.array([0, 1, 2, 3, 4])
a2D = np.array([[1, 2], [3, 4]])
a3D = np.array([[[1, 2], [3, 4]],
                [[5, 6], [7, 8]]])
print(a1D) 
print(a2D[0])
print(a3D)

In [None]:
import pandas as pd
import matplotlib.pyplot as plt


**2. Variable types** 
---
---


In [None]:
# numeric (int, float, complex, boolean), 
b1 = 5.1 # float
b2 = 5 # int
b3 = 1<2 # boolean
print(b1, ' is',type(b1), '\n')
print(b2, ' is',type(b2), '\n')
print(b3, ' is',type(b3), '\n')

In [None]:
# lists 
c = ['karin', 'sasaki', 345.453]
print(c, ' is', type(c), '\n')

In [None]:
# strings 
print(c[0], ' is',type(c[0]), '\n')

In [None]:
# tuples
d = (123, 'john')
print(d, ' is', type(d), '\n')

In [None]:
# dictionaries
e = {'name': 'marcie','code':63214, 'dept': 'mathematics'}
print(e, ' is', type(e), '\n')

In [None]:
# Numpy array
f = np.array([1,2,3,4,5])
print(f, ' is', type(f), '\n')

In [None]:
# note this is different form the list
h = [1,2,3,4,5]
print(h, ' is', type(h), '\n')


***3. Reading and writing text and binary files***
---
Write in text files
---
*open()* : Function. No module is required to be imported for this function.
1. File in the same directory as the python program file or full address of the file (r"file_name")
2. Acess mode: **r**,  **r+**, **w**, **w+**, **a**, **x**

Example: object = open("file.txt", "r")

Note: The r is placed before filename to prevent the characters in filename string to be treated as special character. 


Two ways to write in a file:

*write()* : Inserts a single string.

*writelines()* : Insert a list of string elements.

How to specify number of decimals and string format -->
https://docs.python.org/3/library/string.html#format-specification-mini-language



---

3.1. Write a message in a txt file.

In [None]:
file1 = open("helloWorld.txt","a")
file1.write("Hello World \n I am Hernan")
file1.close()

Read text files
---

There are three ways to read data from a text file.

***read()*** 

object.*read()*: reads the entire file

object.*read(n)*: reads *n* bytes

***readline()*** : reads a line of the file and returns in form of a string.

***readlines()***: reads all the lines and return them as each line a string element in a list.


---
3.4. Read a text file



In [None]:
file4 = open("helloWorld.txt", "r")
print (file4.read())
file4.close()

In [None]:
file4 = open("helloWorld.txt", "r")
print (file4.readline())
file4.close()

Write and Read csv files
---



---
3.9. Use *pandas* to read/write an array to a file 

https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html#csv-text-files

In [None]:
#read a data frame from csv file
df_data = pd.read_csv('SortedNuclei.csv')
df_data

Handling the Current Working Directory
---

***os.path*** : function of the operating system module to manipulate paths. 

---

Current Working Directory (CWD): directory where your are operating. 
*   Whenever you can call the files only by their name. 
*   Folder where the Python script is running.

https://docs.python.org/3/library/os.html


3.10. Get the location of your Current Working Directory (CWD)

In [None]:
import os
parent_dir= os.getcwd()

3.11. Open a file using the path      


In [None]:
name_file = "SortedNuclei.csv"     
path_temp = os.path.join(parent_dir, name_file)
pd.read_csv(path_temp)

#4.Working with data tables
**4. Working with data tables** 
---
4.0. Playing with arrays, list, dictionaries
---
Array vs list 

https://favtutor.com/blogs/python-array-vs-list


In [None]:
#Create a array
ones = np.ones(3)

In [None]:
# Creating a list  
sample_list = [25,"temperatures",['day1','day2']] 
print(sample_list)

In [None]:
# Creating a dictionary
sample_dict = {
  "Temperature": ["Centigrades", "Fahrenheit"],
  "Experiment": "1",
  "year": 2021
}
#sample_dict
print(sample_dict["Temperature"])

---
4.1. Basic statistics
---
https://numpy.org/doc/stable/reference/routines.statistics.html

https://docs.python.org/3/tutorial/datastructures.html

Use numpy: 

***mean***: Compute the arithmetic mean along the specified axis.

***median***: Compute the median along the specified axis

***var***: Compute the variance along the specified axis.

***std***: Compute the standard deviation along the specified axis.

***amin*** Return the minimum of an array or minimum along an axis.

***amax***: Return the maximum of an array or maximum along an axis.

Example: Open a file and get basic statics of a dataset


In [None]:
#path of the file
import os
import numpy as np
parent_dir = os.getcwd()
name_file = "wine.csv"     
path_wine = os.path.join(parent_dir, name_file)

##define lists
wine = []
alcohol =[]
phenols = []
flavonoids = []
all_data_wine = []

## open the file, read the lines, and split the columns into lists
file_wine = open(path_wine)
data = file_wine.read().splitlines()
for i in range(1, len(data)):
    data_split = data[i].split(',')
    wine.append(float(data_split[0]))
    alcohol.append(float(data_split[1]))
    phenols.append(float(data_split[6]))
    flavonoids.append(float(data_split[7]))
  
    all_data_wine.append(np.asarray(data_split, dtype=np.float64))
    
file_wine.close()



In [None]:
#Mean/Median/Variance/STD/ of arrays
print('alcohol:', round(np.mean(alcohol),2), 'std:', round(np.std(alcohol),2))
print('phenols :',round(np.mean(phenols),2), 'std:', round(np.std(phenols),2))
print('flavonoids:', round(np.mean(flavonoids),2), 'std:',round(np.std(flavonoids),2))
print('median and var of alcohol:', np.median(alcohol),'var:', np.var(alcohol))

print('Characterists:', header_wine[6:8])
print('Mean of all:  ', np.round(np.mean(all_data_wine, axis=0)[6:8],2))
print('Std of all:  ', np.round(np.std(all_data_wine, axis=0)[6:8],2))
#print('Mean of rows:', np.mean(all_data_wine, axis=1)[0:6])

 
---
4.2. Ploting with matplotlib
---
https://matplotlib.org/stable/tutorials/introductory/sample_plots.html

***matplotlib.pyplot*** collection of functions. We can create a figure, a 
plotting area, plot points, plot lines, and decorate.

https://matplotlib.org/stable/tutorials/introductory/pyplot.html#sphx-glr-tutorials-introductory-pyplot-py

Example: Plot two arrays in a figure

In [None]:
import matplotlib.pyplot as plt

plt.plot(wine, alcohol, '.', markersize=5, mfc='black', mec='gray', alpha=0.5 )

plt.title('Wine-alcohol', fontsize=16)
plt.xlabel('Wine type', fontsize=16)
plt.ylabel('Alcohol (%)', fontsize=16)
plt.show()

In [None]:
#histogram plots
plt.hist(x=phenols, bins=4, color='b', alpha=0.7, rwidth=0.9)
plt.grid(axis='y', linestyle='--', alpha=0.65)
plt.xlabel('alcohol(%)', fontsize=16)
plt.show()

In [None]:
#Make own bins histogram and save my figure
name_MyGraph = 'my_histogram.png'
path_mygraph = os.path.join(parent_dir, name_MyGraph)

plt.hist(x=flavonoids, bins=[1,2,3,4], color='r', alpha=0.7, rwidth=0.9)
plt.grid(axis='y', ls='--', alpha=0.65)
plt.xlabel('flavonoids(%)', fontsize=16)
plt.savefig(path_mygraph)  ##save my figure
plt.show()


In [None]:
#histogram plots
mybins = np.arange(10.5,15.5, 0.8)
shift = 0.4
n_out, bins_out, patches = plt.hist(x=alcohol, bins=mybins, color='b', alpha=0.7, rwidth=0.8)
plt.plot(mybins[1:len(mybins)]-shift, n_out, lw =2, c ='m')
plt.xlabel('alcohol(%)', fontsize=16)
plt.show()

In [None]:
#Scatter plot

plt.plot(phenols, flavonoids, 's', markersize=5, c='g', alpha=0.6)
plt.plot(phenols[0:50], flavonoids[0:50], 's', markersize=5, c='r', alpha=0.5)
plt.plot(phenols[100:178], flavonoids[100:178], 's', markersize=5, c='y', alpha=1)

plt.title('Phenols-flavonoids', fontsize=16)
plt.xlabel('Phenols (mg/L)', fontsize=16)
plt.ylabel('Flavonoids (mg/L)', fontsize=16)

plt.show()

---
4.4. Ploting images
---

Using ***matplotlib.image*** 

https://matplotlib.org/stable/api/image_api.html?highlight=image#module-matplotlib.image

In [None]:
import matplotlib
import matplotlib.pyplot as plt
import matplotlib.image as mpimg

img = mpimg.imread('dogs.png')
imgplot = plt.imshow(img)

In [None]:
#Enhancing contrast by choosing one channel R, G, B
img2 = img[:, :, 0]
imgplot = plt.imshow(img2)
imgplot.set_cmap('rainbow')
plt.colorbar()
plt.imsave('dogs_in_color.png', img2)

In [None]:
# histogram 
plt.hist(img2.flatten(), bins =50, range=(0,1))
plt.show()
