# HW1 Supplemental Notebook

## [`requests`](https://requests.readthedocs.io/en/latest/)

We can use [`requests`](https://requests.readthedocs.io/en/latest/) to get things from the web -- whether webpages, data, etc.

In [1]:
import requests

In [2]:
r = requests.get("https://raw.githubusercontent.com/kmsaumcis/mcis6273_f17_datamining/master/homework/hw1/bank-data.csv")

if r.status_code == 200: # HTTP OK
    data = r.text # you can also use r.content if there is binary data (or utf-8 encoded strings)

In [3]:
data.split()[:10] # the first 10 lines

['id,age,sex,region,income,married,children,car,save_act,current_act,mortgage,pep',
 'ID12101,48,FEMALE,INNER_CITY,17546,NO,1,NO,NO,NO,NO,YES',
 'ID12102,40,MALE,TOWN,30085.1,YES,3,YES,NO,YES,YES,NO',
 'ID12103,51,FEMALE,INNER_CITY,16575.4,YES,0,YES,YES,YES,NO,NO',
 'ID12104,23,FEMALE,TOWN,20375.4,YES,3,NO,NO,YES,NO,NO',
 'ID12105,57,FEMALE,RURAL,50576.3,YES,0,NO,YES,NO,NO,NO',
 'ID12106,57,FEMALE,TOWN,37869.6,YES,2,NO,YES,YES,NO,YES',
 'ID12107,22,MALE,RURAL,8877.07,NO,0,NO,NO,YES,NO,YES',
 'ID12108,58,MALE,TOWN,24946.6,YES,0,YES,YES,YES,NO,NO',
 'ID12109,37,FEMALE,SUBURBAN,25304.3,YES,2,YES,NO,NO,NO,NO']

## [`zipfile`](https://docs.python.org/3/library/zipfile.html)

Let's say we have a zip file we would like to load:

[https://github.com/kmsaumcis/mcis6273_f17_datamining/archive/refs/heads/master.zip](https://github.com/kmsaumcis/mcis6273_f17_datamining/archive/refs/heads/master.zip)

Let's use r`requests` to load it, and then use zipfile to unzip it.

In [4]:
r = requests.get("https://github.com/kmsaumcis/mcis6273_f17_datamining/archive/refs/heads/master.zip")

if r.status_code == 200:
    data = r.content

Now we have the zipfile, let's manipulate it.  It is best if we just drop the file to the filesystem:

In [5]:
with open("master.zip", "wb") as fo:
    fo.write(data)

Using [zipfile](https://docs.python.org/3/library/zipfile.html) we can extract:

In [6]:
import zipfile
zipfile.ZipFile("master.zip").extractall()

# That's all -- the files are now unzipped

# [`os`](https://docs.python.org/3/library/os.html?highlight=os#module-os), [`glob`](https://docs.python.org/3/library/glob.html?highlight=glob#module-glob), [`shutil`](https://docs.python.org/3/library/shutil.html?highlight=shutil#module-shutil)

Now that we have the zipfiles and extracted them, we need to learn a few things about the [`os`](https://docs.python.org/3/library/os.html?highlight=os#module-os) library.

Let's imaging that we want to create a directory called `./ipynb` and search the unzipped folder for all `.ipynb` files.  The we will take those files and **copy** them to the `./ipynb` folder.

There is one trick we will use to pull this off using the [`glob`](https://docs.python.org/3/library/glob.html?highlight=glob#module-glob) library.

In [7]:
import os     # for file operations
import glob   # for efficient file searching
import shutil # for file copying

In [8]:
nb_file_list = glob.glob("mcis6273_f17_datamining-master/**/*.ipynb")
nb_file_list

['mcis6273_f17_datamining-master/lecture_notes/05_clustering_a.ipynb',
 'mcis6273_f17_datamining-master/lecture_notes/13_big_data.ipynb',
 'mcis6273_f17_datamining-master/lecture_notes/01_introduction.ipynb',
 'mcis6273_f17_datamining-master/lecture_notes/12_text_mining.ipynb',
 'mcis6273_f17_datamining-master/lecture_notes/07_classification_a_dt_bayes.ipynb',
 'mcis6273_f17_datamining-master/lecture_notes/04_pattern_mining.ipynb',
 'mcis6273_f17_datamining-master/lecture_notes/08_classification_b_svm_nn.ipynb',
 'mcis6273_f17_datamining-master/lecture_notes/03_distance_metrics.ipynb',
 'mcis6273_f17_datamining-master/lecture_notes/09_classification_c_prediction_eval.ipynb',
 'mcis6273_f17_datamining-master/lecture_notes/02_preprocessing.ipynb',
 'mcis6273_f17_datamining-master/lecture_notes/06_clustering_b.ipynb',
 'mcis6273_f17_datamining-master/lecture_notes/10_ensemble_methods.ipynb',
 'mcis6273_f17_datamining-master/lecture_notes/11_visualization.ipynb']

Notice this is just a list of files.  We can use that list to **copy** using [`shutil`](https://docs.python.org/3/library/shutil.html?highlight=shutil#module-shutil).

In [9]:
# make a directory
os.mkdir("ipynb")

In [10]:
# copy the first file into the ipynb
shutil.copy(nb_file_list[0], "ipynb")

'ipynb/05_clustering_a.ipynb'

You can now imagine how to produce a loop to copy all the files in the list to the directory.

$\Xi$