<a href="https://colab.research.google.com/github/n0hats/ai_learning/blob/main/Chapter_1/data_augmentation_with_python_chapter_1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/duchaba/Data-Augmentation-with-Python/blob/main/data_augmentation_with_python_chapter_1.ipynb)

# 🌻 Welcome to Chapter 1, Data Augmentation Made Easy


---

I am glad to see you using this Python Notebook. 🐕

The Python Notebook is an integral part of the book. You can add new “code cells” to extend the functions, add your data, and explore new possibilities, such as downloading additional real-world datasets from the Kaggle website and coding the **Fun challenges**. Furthermore, the book has **Fun facts**, in-depth discussion about augmentation techniques, and Pluto, an imaginary Siberian Huskey coding companion. Together they will guide you every steps of the way.

Pluto encourages you to copy or save a copy of this Python Notebook to your local space and add the “text cells” to keep your notes. In other words, read the book and copy the relevant concept to this Python Notebook’s text-cells. Thus, you can have the explanation, note, original code, your code, and any crazy future ideas in one place.  


💗 I hope you enjoy reading the book and hacking code as much as I enjoy writing it.


## 🌟 Amazon Book

---

- The book is available on the Amazon Book website:
  - https://www.amazon.com/dp/1803246456

  - Author: Duc Haba
  - Published: 2023
  - Page count: 390+


- The original Python Notebook is on:
  - https://github.com/PacktPublishing/Data-Augmentation-with-Python/blob/main/Chapter_1/data_augmentation_with_python_chapter_1.ipynb

- 🚀 Click on the blue "Open in Colab" button at the top of this page to begin hacking.


# 😀 Excerpt from Chapter 1, Data Augmentation Made Easy

---

> In case you haven’t bought the book. Here is an teaser from the first page of Chapter 1.

---

Data augmentation is essential for developing a successful Deep Learning (DL) project. However, data scientists and developers often overlook this crucial step. It is not a secret that you will spend the majority of the project time gathering, cleaning, and augmenting the dataset in a real-world DL project. Thus, learning how to expand the dataset without purchasing new data is essential.This book covers standard and advanced techniques for extending the image, text, audio, and tabular dataset. Furthermore, there is a discussion on data biases, and the coding lessons are on Jupyter Python Notebooks.  

Chapter 1 introduces the data augmentation concepts, sets up the coding environment, creates the foundation class, and later chapters explain techniques in detail, including Python coding. The effective use of data augmentation is the proven technique between success and failure in Machine Learning (ML). Many real-world ML projects stay in the conceptual phase because of insufficient data for training the ML model. Data augmentation is a cost-effective technique to increase the dataset, lower the training error rate, and produce a more accurate prediction and forecast.

---
>**Fun fact**

>The car gasoline analogy is helpful for students who first learn about data augmentation and AI. You can think of data for the AI engine as the gasoline and data augmentation as the additive, like the Chevron Techron fuel cleaner, that makes your car engine run faster, smoother, and further without extra petrol.

---

In Chapter 1, we’ll define the data augmentation role and the limitation of how much to extend the data without changing the data integrity. We’ll briefly discuss the different types of input data, such as image, text, audio, and tabular data, and the challenges in supplementing the data. Finally, we’ll set up the system requirements and the programming style in the accompanying Jupyter Python Notebook.   

I design this book to be a hands-on journey. It will be more effective to read a chapter, run the code, re-read the chapter’s part that confused you, and jump back to hacking the code until the concept or technique is firmly understood.    

You are encouraged to change or add new code to the Python Notebooks. The primary purpose is interactive learning. Thus, if something goes horribly wrong, download a fresh copy from the book GitHub. The surest method to learn is to make mistakes and create something new.

Data augmentating is an iterative process. There is no fixed recipe. In other words, depending on the dataset, you select augmented functions and jiggle the parameters. A subject domain expert may provide insight into how much distortion is acceptable. By the end of Chapter 1, you will learn the general rules for data augmentation, what type of input data can be augmented, the programming style, and how to set up a Python Jupyter Notebook online or offline.

In particular, Chapter 1 will cover the following primary topics:

- Data augmentation role

- Data input types

- Python Jupyter Notebook

- Programing styles

Let’s start with augmentation role.

---

🌴 *end of excerpt from the book*


## Programming styles

In [None]:
# git version should be 2.17.1 or higher
!git --version

In [None]:
# clone the official code

url = 'https://github.com/PacktPublishing/Data-Augmentation-with-Python'
!git clone {url}

In [None]:
# set chapter 1 name

pluto_chapter_1 = './Data-Augmentation-with-Python/pluto/pluto_chapter_1.py'

In [None]:
# %%writefile {pluto_chapter_1}

# AI function documentation
#
# prompt: write documentation for the following function: add_method()
# Note: B-grade, cycle the above to one function at a time. Give multiple
# functions will confused it.

# create an object
# First, importing the basic library
import torch
import pandas
import numpy
import matplotlib
import pathlib
import PIL
import datetime
import sys
import psutil
# create class/object
class PacktDataAug(object):
  """
    The PacktDataAug class is the based class for the
    "Data Augmentation with Python" book.
  """
  #
  # initialize the object
  def __init__(self, name="Pluto", is_verbose=True,*args, **kwargs):
    """

    This is the constructor function.

    Args:

     name (str): It requires a name for the object. The default is 'Pluto'
     verbose (bool):  The default value of `verbose` is True. This function prints out the
        name of the object if `is_verbose == True`. This is used to debug
        code. When you are ready to deploy the model, then you should set
        `is_verbose == False` in order to avoid printing out diagnostic
        messages.

      Additionally, this function takes any number of other
      parameters. These parameters are stored in `**kwargs` and are
      accessed via the function `get_kwargs()`. See the documentation
      for `get_kwargs()` for more details.
      Note that `__init__()` is
      automatically called when you create a new object.

    Returns:
      None.
    """
    super(PacktDataAug, self).__init__(*args, **kwargs)
    self.author = "Duc Haba"
    self.version = 1.0
    self.name = name
    if (is_verbose):
      self._ph()
      self._pp("Hello from class", f"{self.__class__} Class: {self.__class__.__name__}")
      self._pp("Code name", self.name)
      self._pp("Author is", self.author)
      self._ph()
    #
    return
  #
  # pretty print output name-value line
  def _pp(self, a, b):
    """

      pretty print output name-value line

      Args:
          a (str): Name of key
          b (any): value of key

      Returns:
          None
    """
    print("%28s : %s" % (str(a), str(b)))
    return
  #
  # pretty print the header or footer lines
  def _ph(self):
    """
      pretty print the header or footer lines

      Args:
          None

      Returns:
          None
      """
    print("-" * 28, ":", "-" * 28)
    return
# ---end of class
#
# Hack it! Add new decorator
# add_method() is inspired Michael Garod's blog,
# AND correction by: Филя Усков
#
import functools
def add_method(x):
  """

    Decorator creates a new method to class
    `x` with the same name and parameters as function `z`
    Args:
        x: class to add function
        z: function to add to class `x`
    Returns:
        a decorator
  """
  def dec(z):
    @functools.wraps(z)
    def y(*args, **kwargs):
      return z(*args, **kwargs)
    setattr(x, z.__name__, y)
    return z
  return dec
#

In [None]:
# %%writefile -a {pluto_chapter_1}

# create pluto (or any name you choose)
pluto = PacktDataAug("Pluto")

In [None]:
# %%writefile -a {pluto_chapter_1}

@add_method(PacktDataAug)
def say_sys_info(self):
  """

    Print out system information. Useful for
    debugging purposes. Prints out information such as
    the system time, platform, Python version, PyTorch
    version, Pandas version, PIL version, and
    Matplotlib version. Also prints the number of CPU
    cores and the CPU speed.

    Note that this function is added to the class `PacktDataAug` via
    the decorator `@add_method()`. This means that you can
    call this function as `p.say_system_info()`,
    where `p` is an instance of `PacktDAtaAug`.

    Args:
      None

    Returns:
      None
  """
  self._ph()
  now = datetime.datetime.now()
  self._pp("System time", now.strftime("%Y/%m/%d %H:%M"))
  self._pp("Platform", sys.platform)
  self._pp("Pluto Version (Chapter)", self.version)
  v = sys.version.replace('\n', '')
  self._pp("Python (3.7.10)", f'actual: {v}')
  self._pp("PyTorch (1.11.0)", f'actual: {torch.__version__}')
  self._pp("Pandas (1.3.5)", f'actual: {pandas.__version__}')
  self._pp("PIL (9.0.0)", f'actual: {PIL.__version__}')
  self._pp("Matplotlib (3.2.2)", f'actual: {matplotlib.__version__}')
  #
  try:
    val = psutil.cpu_count()
    self._pp("CPU count", val)
    val = psutil.cpu_freq()
    if (None != val):
      val = val._asdict()
      self._pp("CPU speed",  f'{val["current"]/1000:.2f} GHz')
      self._pp("CPU max speed", f'{val["max"]/1000:.2f} GHz')
    else:
      self._pp("*CPU speed", "NOT available")
  except:
    pass
  self._ph()
  return

In [None]:
pluto.say_sys_info()

In [None]:
# end of chapter 1
print('End of chapter 1')

In [None]:
# extra:

# review the AI documentation for the code.
help(pluto)

In [None]:
# specific AI doc
help(pluto.say_sys_info)

## Export to pure Python code (Optional)

- Add the "%%writefile your_file_name.py" to the first code cell that you want export

- Add the "%%writefile -a your_file_name.py" (-a is for append) to the code cells that you want to export.

- Make it a comment when you using the code cells normally.

- Uncommend the "%%writefile" and run each code cells to export the file.

## Push up all changes (Optional)

- username: [your github username or email]

- password: [use github token]

In [None]:
import os
f = 'Data-Augmentation-with-Python'
# os.chdir(f)
!git add -A
!git config --global user.email "duc.haba@gmail.com"
!git config --global user.name "duchaba"
!git commit -m "update with latest ai doc for with output file"

## Summary

Every chaper will begin with same base class "PacktDataAug".

✋ FAIR WARNING:

- The coding uses long and complete function path name.

- I wrote the code for easy to understand and not for compactness, fast execution, nor cleaverness.



## 🙅
- Extra, extra ...
- These are page/book clean up routines, so run with extreme caution.

In [None]:
# # do the git push in the xterm console
# #!git push

# !pip install colab-xterm
# %load_ext colabxterm
# %xterm

In [None]:
# prompt: connect to google drive

from google.colab import drive
drive.mount('/content/drive')

In [None]:
# # copy latest to google drive so that we can load/open new notebook.
# import os
# dest = '/content/drive/MyDrive/"Colab Notebooks"/book/Data-Augmentation-with-Python'
# os.makedirs(dest, exist_ok=True)
# src = '/content/Data-Augmentation-with-Python/*'
# !cp -fr {src} {dest}

In [None]:
# define file names
orig_file = 'data_augmentation_with_python_chapter_1.ipynb'
orig_path = '/content/Data-Augmentation-with-Python/Chapter_1/'
this_file = f'/content/drive/MyDrive/"Colab Notebooks"/{orig_file}'
#
# Pick one below
#
local_file = f'{orig_path}{orig_file}'
# local_file = f'{orig_path}data_augmentation_with_python_chapter_1_with_output.ipynb'

In [None]:
# # STEP 1: copy local drive to google drive (this file)
# # override so need to reload this page
# #
# !cp -f {local_file} {this_file}

In [None]:
# STEP 2: copy (this file) to local drive for sync with github.
# besure to save the latest first
#
!cp -f {this_file} {local_file}