# <a id='toc1_'></a>[COMP712 Classical Artificial Intelligence](#toc0_)

# <a id='toc2_'></a>[Workshop: Data Science using Python](#toc0_)

Dr Daniel Zhang @ Falmouth University\
2023-2024 Study Block 1

<div id="top"></div>

# Table of contents<a id='top'></a><a id='toc0_'></a>    
- [COMP712 Classical Artificial Intelligence](#toc1_)    
- [Workshop: Data Science using Python](#toc2_)    
- [<a id='toc0_'></a>](#toc3_)    
  - [Introduction](#toc3_1_)    
  - [Jupyter Notebook Basics](#toc3_2_)    
  - [Cells](#toc3_3_)    
- [Test TOC](#toc4_)    

<!-- vscode-jupyter-toc-config
	numbering=false
	anchor=true
	flat=false
	minLevel=1
	maxLevel=6
	/vscode-jupyter-toc-config -->
<!-- THIS CELL WILL BE REPLACED ON TOC UPDATE. DO NOT WRITE YOUR TEXT IN THIS CELL -->

## <a id='toc3_1_'></a>[Introduction](#toc0_)

[Top](#top)

Welcome to this comprehensive workshop on Data Science using Python! Once the notebook is launched successfully, it indicates that all the necessary prerequisites are met, and you're all set to begin!

## <a id='toc3_2_'></a>[Jupyter Notebook Basics](#toc0_)

[Top](#top)

JupyterLab, the successor of Jupyter Notebook, is a powerful interactive development environment (IDE) that revolutionises the way we work with data, code, and visualisations. At its core, JupyterLab provides a flexible and user-friendly interface that combines various elements crucial for data analysis and scientific computing. 

JupyterLab provides an interactive notebook interface where you can write and execute code, add text explanations using markdown, and visualise results in real-time. This enables seamless transitions between various tasks, promoting a highly interactive and collaborative workflow.

At the heart of JupyterLab lies the concept of kernels, which are computational engines responsible for executing code within notebooks. This architecture allows users to work with different programming languages within the same interface. Users can choose from various kernels, such as `Python`, `R`, `Julia`, and others, making JupyterLab a versatile environment for multi-language development and analysis. Overall, JupyterLab's simplicity, versatility, and integration of different tools make it an indispensable platform for data scientists, researchers, and educators seeking an intuitive yet powerful environment for data exploration and analysis.

# The Notebook Elements

[Top](#top)

After opening a notebook, the left file panel remains unchanged. The main area would be used for displaying the notebook's contents and interactions with the Python kernel.

## The Main Toolbar

[Top](#top)

The JupyterLab main toolbar consists of buttons that provide quick access to essential functionalities, as shown below.

![JupyterLab Toolbar Buttons](img/jupyterlab_toolbar.png)

Here's an overview of some common toolbar buttons:

1. `Save`: The floppy disk icon represents the "Save" button, allowing users to save changes made to the notebook. Clicking this button or using the shortcut <kbd>Ctrl + S</kbd> (or <kbd>Cmd + S</kbd> on Mac) saves the notebook.

2. `Add Cell`: This button adds a new cell to the notebook. Clicking the "<kbd>+</kbd>" icon creates a new cell below the currently selected cell.

3,4,5. `Cut`, `Copy`, `Paste`: The scissors, copy, and clipboard icons respectively perform cut, copy, and paste operations on cells within the notebook.

6. `Run`: The "Run" button executes the code in the currently selected cell. Pressing this button or using <kbd>Shift + Enter</kbd> runs the cell and displays the output below the code cell.

7. `Interrupt Kernel`: The square "stop" icon is used to interrupt or halt the execution of code cells. It stops the execution of a cell that's taking too long to run or is stuck in an infinite loop.

8. `Restart Kernel`: The circular arrow icon restarts the kernel. Restarting the kernel resets the computational state, clearing all variables and previously executed code. Use this button cautiously, as it resets the notebook's memory.

9. `Restart Kernel and Run All Cells`: The fast-forward icon restarts the kernel and executes all cells in the current notebook one after another. This button resets the notebook's memory as well.

10. `Cell Type`: The dropdown menu allows users to change the cell type (`Code`, `Markdown`, and `Raw`) of the selected cell, which will be explained below in detail.


There is a kernel status indicator area on the notebook's top-right corner. This indicator displays the status of the kernel (the computational engine) associated with the notebook. It shows whether the kernel is idle, busy executing code, or has encountered an error, as shown below.

![Kernel Status](img/jupyterlab_kernel.png)

## <a id='toc3_3_'></a>[Cell Types](#toc0_)

[Top](#top)

Inside the notebook, you'll find cells where you can write code or text. The current cell can be executed by pressing <kbd>Shift + Enter</kbd> or the `Run` button in the main toolbar. If you highlight one cell, some extra shortcut buttons will appear at the top-right corner of the cell for cell manipulation. These buttons can be used to move the current cell up or down, make a duplication of the cell, add a new cell above or below the current one, or delete the cell from the notebook.

In JupyterLab, there are primarily three types of cells: `Code`, `Markdown`, and `Raw`.

### **Code Cells**

[Top](#top)

These cells are used to write and execute code. When you enter Python, R, Julia, or any other supported language's code into a code cell, you can run it by pressing <kbd>Shift + Enter</kbd>. The output of the code appears directly below the cell. Code cells are where you perform computations, define functions, import libraries, and execute algorithms. They are the core components for interactive programming within Jupyter notebooks.

For example, the following cell is a **`code`** cell that displays the information of your machine and operating system. 

> **Note**: the `%%time` is a useful magic command I used a lot personally. It can be placed as the first line of any `code cell` to measure the running time of the cell.

In [5]:
%%time

import platform
print('System information: ' + platform.machine() + '-' + platform.system()  + '-' + platform.version())

System information: AMD64-Windows-10.0.22621
CPU times: total: 0 ns
Wall time: 0 ns


> **Note**: The output of a cell can be removed by right-clicking on the cell and selecting `Clear Cell Output` from the pop-up menu items. Or you can clear all the output blocks in the current notebook by selecting `Clear Outputs of All Cells`.

### **Markdown Cell** 

[Top](#top)

Markdown cells are used for text explanations, formatted documentation, and commentary within the notebook. Markdown is a lightweight markup language that allows users to add formatted text, headings, lists, images, hyperlinks, and more. Users can create rich text content by applying simple syntax, making it a versatile tool for adding context, explanations, or instructions alongside code. For instance, most of this workshop materials are written in `Markdown` cells.

### **Raw Cell**

[Top](#top)

Raw cells are uncommonly used and are primarily used for storing unformatted text or content that should not be executed. Raw cells allow users to enter text that will be included in the notebook metadata but will not be formatted or executed as code or markdown. This feature might be useful for including raw data or annotations that don't require formatting. 

An example of a `Raw` cell is shown below. In practice, you might not use it very often, or at all, as the combination of `Code` and `Markdown` cells can already fulfil our requirements.

# Section 1: Working with NumPy

[Top](#top)
`NumPy` is the core library for scientific computing in Python. It provides high-performance arrays and matrices for efficient data manipulation. As we already discussed its capabilities in the lecture session, some tasks are set up for you to practice.ting.

For consistance, `NumPy` has been imported as `np` for the rest of this notebook. There are several helper functions defined already.

In [29]:
%%time

# importing numpy library 
import numpy as np 
print(f'NumPy version: {np.__version__}')

def print_array(arr): 
    ''' print a NumPy array or Python 2D list '''
    if not isinstance(arr, np.ndarray) and not isinstance(arr,list):
        print('ERROR: the input must be either NumPy ndarray or Python 2D list')
        return
    if isinstance(arr, np.ndarray) and arr.ndim != 2:
        print(f'ERROR: the input NumPy array must be a 2D array, current dim = {arr.ndim}')
        return 
    if isinstance(arr, list):
        if len(arr) == 0:
            return 
        if not isinstance(arr[0], list):
            print(f'ERROR: the input Python list must be a list of list')
            return 
    for row in arr:
        print(''.join([chr(c) for c in row]))

NumPy version: 1.26.1
CPU times: total: 0 ns
Wall time: 1.01 ms


## Tasks

[Top](#top)

1. Create `NumPy` arrays, desired output:



In [23]:
a = [[1,2],[3,4]]
print_array(a)





In [26]:
line = []
with open('banner.txt','r') as fin:
    lines = fin.readlines()
arr = [[ord(c) for c in row] for row in lines]
gt = np.array(arr)

In [27]:
len(gt)

102

In [28]:
print_array(gt)

MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM

MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM

MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM

MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM

In [32]:
arr = np.copy(gt)

In [None]:
type(arr)

In [34]:
arr.shape


(102, 301)

In [36]:
new_arr = np.ones((arr.shape[0]*2,arr.shape[1]*2))


In [37]:
new_arr.shape

(204, 602)