<a href="https://colab.research.google.com/github/haris18896/Python-Data-Analysis/blob/main/07_Functions.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Functions

## What is a Function

### Notes

* A **function** is a block of code that only runs when it's called.
* You can pass data (called **parameters**) into a function.
* The function can return data as a result.

## Importance

Enable us to resuse the code and make it more modular, important for complex data analysis and plotting routines.


## Types of Functions

| Type of Function             | Example Function              | Section            |
|------------------------------|-------------------------------|--------------------|
| Built-In functions           | `max()`                       | 1. Getting Started |
| User-defined functions       | `def my_function(): pass`     | 16. Functions      |
| Lambda functions             | `lambda x: x + 1`             | 17. Lambda         |
| Standard Library functions   | `math.sqrt()`                 | 18. Modules        |
| Third-Party Library Functions| `numpy.array()`               | 19. Library        |

Note: We won't be covering Generator, Asynchronous, or Recursive Functions as they are out of scope of Data Analytics.


# [Builtin functions](https://docs.python.org/3/library/functions.html)




In [1]:
help(all)

Help on built-in function all in module builtins:

all(iterable, /)
    Return True if bool(x) is True for all values x in the iterable.
    
    If the iterable is empty, return True.



In [2]:
import types

print([func for func in dir(__builtins__) if isinstance(getattr(__builtins__, func), types.BuiltinFunctionType)])

['__build_class__', '__import__', 'abs', 'aiter', 'all', 'anext', 'any', 'ascii', 'bin', 'breakpoint', 'callable', 'chr', 'compile', 'delattr', 'dir', 'divmod', 'eval', 'exec', 'format', 'getattr', 'globals', 'hasattr', 'hash', 'hex', 'id', 'isinstance', 'issubclass', 'iter', 'len', 'locals', 'max', 'min', 'next', 'oct', 'open', 'ord', 'pow', 'print', 'repr', 'round', 'setattr', 'sorted', 'sum', 'vars']


In [3]:
salary_list = [10000, 120000, 130000, 50000, 197692]

def calculate_salary(salary, rate=.1):
  total_salary = salary * (1 + rate)

  return total_salary

total_salary_list = [calculate_salary(salary) for salary in salary_list]

total_salary_list

[11000.0, 132000.0, 143000.0, 55000.00000000001, 217461.2]

# Lambda

* anonymus functions
* lambda x: x + 1

In [4]:
mul_two = lambda x: x*2
mul_two(2)

4

In [5]:
(lambda x: x*2)(3)

6

In [6]:
(lambda x, y : x * 2 + y*3)(3, 4)

18

In [7]:
(lambda *args: sum(args))(1,2,3,4,5)

15

In [8]:
(lambda **kwargs: sum(kwargs.values()))(a=1, b=2, c=3)

6

In [9]:
(lambda **kwargs: kwargs.values())(a=1, b=2, c=3)

dict_values([1, 2, 3])

In [10]:
(lambda salary, rate : salary * (1 + rate))(1000, 0.1)

1100.0

In [11]:
total_salary_list = [(lambda x: x * (1 + 0.1))(salary) for salary in salary_list]

total_salary_list

[11000.0, 132000.0, 143000.0, 55000.00000000001, 217461.2]

In [12]:
job_data = [
    {
        'job_title': "Data Scientist",
        "job_skills": ["Python", "Machine Learning", "Statistics"],
        "remote": True
    },
     {
         'job_title': "Data Scientist",
        "job_skills": ["SQL", "Data Visualization", "Data Cleaning"],
        "remote": False
    },
    {
        'job_title':"Machine Learning Engineer",
        "job_skills": ["Python", "Machine Learning", "Cloud Computing"],
        "remote": True
    },
     {
        'job_title':"Data Engineer",
        "job_skills": ["Python", "SQL", "Data Warehousing"],
        "remote": False
    },
    {
        'job_title' : "Business Intelligence Analyst",
        "job_skills": ["Excel", "Power BI", "Data Analysis"],
        "remote": True
    }
]


help(filter)

Help on class filter in module builtins:

class filter(object)
 |  filter(function or None, iterable) --> filter object
 |  
 |  Return an iterator yielding those items of iterable for which function(item)
 |  is true. If function is None, return the items that are true.
 |  
 |  Methods defined here:
 |  
 |  __getattribute__(self, name, /)
 |      Return getattr(self, name).
 |  
 |  __iter__(self, /)
 |      Implement iter(self).
 |  
 |  __next__(self, /)
 |      Implement next(self).
 |  
 |  __reduce__(...)
 |      Return state information for pickling.
 |  
 |  ----------------------------------------------------------------------
 |  Static methods defined here:
 |  
 |  __new__(*args, **kwargs) from builtins.type
 |      Create and return a new object.  See help(type) for accurate signature.



In [13]:
list(filter(lambda job: job['remote'], job_data))

[{'job_title': 'Data Scientist',
  'job_skills': ['Python', 'Machine Learning', 'Statistics'],
  'remote': True},
 {'job_title': 'Machine Learning Engineer',
  'job_skills': ['Python', 'Machine Learning', 'Cloud Computing'],
  'remote': True},
 {'job_title': 'Business Intelligence Analyst',
  'job_skills': ['Excel', 'Power BI', 'Data Analysis'],
  'remote': True}]

In [14]:
list(filter(lambda job: job['remote'] and 'Python' in job["job_skills"], job_data))

[{'job_title': 'Data Scientist',
  'job_skills': ['Python', 'Machine Learning', 'Statistics'],
  'remote': True},
 {'job_title': 'Machine Learning Engineer',
  'job_skills': ['Python', 'Machine Learning', 'Cloud Computing'],
  'remote': True}]

# Module

In [15]:
import my_module

my_module.skill_list



['python', 'sql', 'java']

In [16]:
my_module.skill('python')

'python is my favourite skill'

In [17]:
from job_analyzer import calculate_salary, calculate_bonus

calculate_salary(100)

# calculate_bonus(1100, 1000)

110.00000000000001

In [18]:
help(calculate_salary)

Help on function calculate_salary in module job_analyzer:

calculate_salary(salary, rate=0.1)
    Calculate the total salary based on the base salary and bonus
    
    Args:
    salary (Float): base Salary.
    rate (Float): The bonus rate. Default is .1
    
    Returns:
    float: The total salary



In [19]:
salary_list = [9800, 1000, 5670, 1234, 4321]

import statistics

statistics.mean(salary_list)

4405

In [20]:
help(statistics)

Help on module statistics:

NAME
    statistics - Basic statistics module.

MODULE REFERENCE
    https://docs.python.org/3.10/library/statistics.html
    
    The following documentation is automatically generated from the Python
    source files.  It may be incomplete, incorrect or include features that
    are considered implementation detail and may vary between Python
    implementations.  When in doubt, consult the module reference at the
    location listed above.

DESCRIPTION
    This module provides functions for calculating statistics of data, including
    averages, variance, and standard deviation.
    
    Calculating averages
    --------------------
    
    Function            Description
    mean                Arithmetic mean (average) of data.
    fmean               Fast, floating point arithmetic mean.
    geometric_mean      Geometric mean of data.
    harmonic_mean       Harmonic mean of data.
    median              Median (middle value) of data.
    median_low  

In [21]:
from statistics import mean, median, mode

mean(salary_list)

4405

In [22]:
median(salary_list)

4321

In [23]:
mode(salary_list)

9800

In [24]:
data_science_jobs = [
    {'job_title': 'Data Scientist', 'job_skills': "['Python', 'SQL', 'Machine Learning']", 'job_date': '2023-05-12'},
    {'job_title': 'Machine Learning Engineer', 'job_skills': "['Python', 'TensorFlow', 'Deep Learning']", 'job_date': '2023-05-15'},
    {'job_title': 'Data Analyst', 'job_skills': "['SQL', 'R', 'Tableau']", 'job_date': '2023-05-10'},
    {'job_title': 'Business Intelligence Developer', 'job_skills': "['SQL', 'PowerBI', 'Data Warehousing']", 'job_date': '2023-05-08'},
    {'job_title': 'Data Engineer', 'job_skills': "['Python', 'Spark', 'Hadoop']", 'job_date': '2023-05-18'},
    {'job_title': 'AI Specialist', 'job_skills': "['Python', 'PyTorch', 'AI Ethics']", 'job_date': '2023-05-20'}
]

In [25]:
from datetime import datetime, date

datetime.now()


datetime.datetime(2024, 6, 9, 11, 38, 30, 880438)

In [26]:
type(data_science_jobs[0]["job_date"])

str

In [27]:
print(datetime.strptime(data_science_jobs[0]["job_date"], "%Y-%m-%d"))

2023-05-12 00:00:00


In [28]:
for job in data_science_jobs:
  job["job_date"] = datetime.strptime(job["job_date"], "%Y-%m-%d")

In [29]:
data_science_jobs

[{'job_title': 'Data Scientist',
  'job_skills': "['Python', 'SQL', 'Machine Learning']",
  'job_date': datetime.datetime(2023, 5, 12, 0, 0)},
 {'job_title': 'Machine Learning Engineer',
  'job_skills': "['Python', 'TensorFlow', 'Deep Learning']",
  'job_date': datetime.datetime(2023, 5, 15, 0, 0)},
 {'job_title': 'Data Analyst',
  'job_skills': "['SQL', 'R', 'Tableau']",
  'job_date': datetime.datetime(2023, 5, 10, 0, 0)},
 {'job_title': 'Business Intelligence Developer',
  'job_skills': "['SQL', 'PowerBI', 'Data Warehousing']",
  'job_date': datetime.datetime(2023, 5, 8, 0, 0)},
 {'job_title': 'Data Engineer',
  'job_skills': "['Python', 'Spark', 'Hadoop']",
  'job_date': datetime.datetime(2023, 5, 18, 0, 0)},
 {'job_title': 'AI Specialist',
  'job_skills': "['Python', 'PyTorch', 'AI Ethics']",
  'job_date': datetime.datetime(2023, 5, 20, 0, 0)}]

# Abstract Syntax Tree

In [30]:
import ast

for job in data_science_jobs:
  job['job_skills'] = ast.literal_eval(job["job_skills"])

data_science_jobs

[{'job_title': 'Data Scientist',
  'job_skills': ['Python', 'SQL', 'Machine Learning'],
  'job_date': datetime.datetime(2023, 5, 12, 0, 0)},
 {'job_title': 'Machine Learning Engineer',
  'job_skills': ['Python', 'TensorFlow', 'Deep Learning'],
  'job_date': datetime.datetime(2023, 5, 15, 0, 0)},
 {'job_title': 'Data Analyst',
  'job_skills': ['SQL', 'R', 'Tableau'],
  'job_date': datetime.datetime(2023, 5, 10, 0, 0)},
 {'job_title': 'Business Intelligence Developer',
  'job_skills': ['SQL', 'PowerBI', 'Data Warehousing'],
  'job_date': datetime.datetime(2023, 5, 8, 0, 0)},
 {'job_title': 'Data Engineer',
  'job_skills': ['Python', 'Spark', 'Hadoop'],
  'job_date': datetime.datetime(2023, 5, 18, 0, 0)},
 {'job_title': 'AI Specialist',
  'job_skills': ['Python', 'PyTorch', 'AI Ethics'],
  'job_date': datetime.datetime(2023, 5, 20, 0, 0)}]

# Library

In [31]:
file = open("sample_data/california_housing_test.csv")

content = file.read()

file.close()

In [32]:
import csv

data_dict = {}

for index, row in enumerate(csv.reader(content.strip().split('\n'))):
  for column in row:
    data_dict[column] = []
  else:
    for col_index, value in enumerate(row):
      data_dict[list(data_dict.keys())[col_index]].append(value)

data_dict

{'longitude': ['longitude',
  '-122.050000',
  '-118.300000',
  '-117.810000',
  '-118.360000',
  '-119.670000',
  '-119.560000',
  '-121.430000',
  '-120.650000',
  '-122.840000',
  '-118.020000',
  '-118.240000',
  '-119.120000',
  '-121.930000',
  '-117.030000',
  '-117.970000',
  '-117.990000',
  '-120.810000',
  '-121.200000',
  '-118.880000',
  '-122.590000',
  '-122.150000',
  '-121.370000',
  '-118.160000',
  '-122.200000',
  '-117.280000',
  '-118.030000',
  '-122.420000',
  '-118.390000',
  '-118.450000',
  '-118.480000',
  '-119.350000',
  '-118.300000',
  '-121.130000',
  '-118.080000',
  '-118.320000',
  '-118.110000',
  '-122.530000',
  '-118.020000',
  '-118.050000',
  '-119.010000',
  '-119.320000',
  '-116.920000',
  '-118.060000',
  '-117.270000',
  '-118.230000',
  '-117.240000',
  '-121.910000',
  '-118.290000',
  '-121.350000',
  '-117.990000',
  '-120.990000',
  '-119.420000',
  '-122.210000',
  '-118.170000',
  '-117.900000',
  '-117.990000',
  '-121.420000',
  '

In [33]:
sum(float(room.replace(',', '')) for room in data_dict['total_rooms'] if room.replace(' ', '').replace('.', '').isdigit())

7798736.0

In [34]:
import pandas as pd

contents = pd.read_csv("sample_data/california_housing_test.csv")

contents

Unnamed: 0,longitude,latitude,housing_median_age,total_rooms,total_bedrooms,population,households,median_income,median_house_value
0,-122.05,37.37,27.0,3885.0,661.0,1537.0,606.0,6.6085,344700.0
1,-118.30,34.26,43.0,1510.0,310.0,809.0,277.0,3.5990,176500.0
2,-117.81,33.78,27.0,3589.0,507.0,1484.0,495.0,5.7934,270500.0
3,-118.36,33.82,28.0,67.0,15.0,49.0,11.0,6.1359,330000.0
4,-119.67,36.33,19.0,1241.0,244.0,850.0,237.0,2.9375,81700.0
...,...,...,...,...,...,...,...,...,...
2995,-119.86,34.42,23.0,1450.0,642.0,1258.0,607.0,1.1790,225000.0
2996,-118.14,34.06,27.0,5257.0,1082.0,3496.0,1036.0,3.3906,237200.0
2997,-119.70,36.30,10.0,956.0,201.0,693.0,220.0,2.2895,62000.0
2998,-117.12,34.10,40.0,96.0,14.0,46.0,14.0,3.2708,162500.0


In [35]:
contents['total_bedrooms']

0        661.0
1        310.0
2        507.0
3         15.0
4        244.0
         ...  
2995     642.0
2996    1082.0
2997     201.0
2998      14.0
2999     263.0
Name: total_bedrooms, Length: 3000, dtype: float64

In [36]:
contents['total_bedrooms'].sum()

1589852.0

In [37]:
!pip install pandas



In [38]:
!pip list

Package                          Version
-------------------------------- ---------------------
absl-py                          1.4.0
aiohttp                          3.9.5
aiosignal                        1.3.1
alabaster                        0.7.16
albumentations                   1.3.1
altair                           4.2.2
annotated-types                  0.7.0
anyio                            3.7.1
argon2-cffi                      23.1.0
argon2-cffi-bindings             21.2.0
array_record                     0.5.1
arviz                            0.15.1
astropy                          5.3.4
astunparse                       1.6.3
async-timeout                    4.0.3
atpublic                         4.1.0
attrs                            23.2.0
audioread                        3.0.1
autograd                         1.6.2
Babel                            2.15.0
backcall                         0.2.0
beautifulsoup4                   4.12.3
bidict                           0.23.1

In [39]:
!pip install pyjokes



In [40]:
import pyjokes

pyjokes.get_joke()

'Number of days since I have encountered an off-by-one error: 0.'

# Classes

In [41]:
class HarisList:
  def __init__(self):
    """Initialize an empty list."""
    self.items = []

  def add(self, item):
    "Add an item to the list."
    self.items.append(item)

  def remove(self, item):
    "Remove an item from the list."
    self.items.remove(item)

  def __len__(self):
    "Return the number of items in the list."
    return len(self.items)

  def __getitem__(self, index):
    "Return the item at the given index."
    return self.items[index]

  def __iter__(self):
    "Return an iterator over the items in the list."
    return iter(self.items)

  def __repr__(self):
    "Return a string representation of the list."
    return f"HarisList({self.items})"

In [42]:
my_list = HarisList()

my_list

HarisList([])

In [43]:
my_list.add('Data nerd')

In [44]:
my_list

HarisList(['Data nerd'])

In [45]:
len(my_list)

1

In [61]:
class BaseSalary:
  def __init__(self, base_salary, bonus_rate=.1, symbol="$"):
    """
    Initialize a BaseSalary object with the given base salary, bonus rate, and symbol.

    Args:
      base_salary (int): The base salary amount.
      bonus_rate (float): The bonus rate.
      symbol (str): The currency symbol.
    """
    self.base_salary = base_salary
    self.bonus_rate = bonus_rate
    self.symbol = symbol
    self.total_salary = base_salary * (1 + bonus_rate)
    self.bonus = self.total_salary - base_salary

  def __repr__(self):
    return f"{self.symbol}{self.base_salary:,.0f}"

  def show_salary(self):
    """
    Calculate the total salary based on the base salary and bonus

    Args:
    salary (Float): base Salary.
    rate (Float): The bonus rate. Default is .1

    Returns:
    float: The total salary
    """
    return f"{self.symbol}{self.total_salary:,.0f}"


  def show_bonus(self):
    """
    Calculate the bonus based on the total salary and base salary

    Args:
    total_salary (Float): total Salary.
    base_salary (Float): The base salary

    Returns:
    float: the Bonus rate
    """
    return f"{self.symbol}{self.bonus:,.0f}"


In [62]:
salary = BaseSalary(10000, 0.3)

In [64]:
salary.show_salary()

'$13,000'

In [63]:
salary.show_bonus()

'$3,000'