# Python Basics for Data Science 
***
### IntroPython2.1 Python Basics-Operators  
### IntroPython2.2 Python Basics-Variables, Data Types, and Data Type Conversion
### IntroPython2.3 Python Basics-Data Structures
### IntroPython2.4 Python Basics-Built-in Functions and Methods
### IntroPython2.5 Python Basics-Create Our Own Function and Lambda
### IntroPython2.6 Python Basics-If Statement
### IntroPython2.7 Python Basics-Loops
### IntroPython2.8 Python Basics-Import Statement and Important Built-in Modules, Syntax Essentials and Best Practices
***

### Built-in vs. user-defined functions and methods: 
The cool thing is that besides the long list of built-in functions/methods, **you can create your own**.

Note:
- A list of built-in functions in Python: https://docs.python.org/3/library/functions.html

- A list of built-in methods in Python: https://docs.python.org/3/tutorial/datastructures.html

## Create Functions of Our Own and lambda

Functions are common to all programming languages, and it can be defined as a block of re-usable code to perform specific tasks. But defining functions in Python means knowing both types first: built-in and user-defined. Built-in functions are usually a part of Python packages and libraries, whereas user-defined functions are written by the developers to meet certain requirements. In Python, all functions are treated as objects, so it is more flexible compared to other high-level languages.

### When to Create A Function of Our Own (Importance of user-defined functions in Python)
- We should consider writing a function whenever you’ve copied and pasted a block of code **more than twice**
- Extracting repeated code out into a function to prevent you from making **careless mistakes**
- Another advantage is that if our requirements change, we only need to **make the change in one place**

In general, developers can write user-defined functions or it can be borrowed as a third-party library. This also means your own user-defined functions can also be a third-party library for other users. User-defined functions have certain advantages depending when and how they are used:

- User-defined functions are **reusable** code blocks; they only need to be written once, then they can be used multiple times. They can even be used in other applications, too.
- These functions are very useful, from writing common utilities to specific business logic. These functions can also be modified per requirement.
- The code is usually well organized, easy to maintain, and developer-friendly. Which means it can support the modular design approach.
- As user-defined functions can be written independently, the tasks of a project can be distributed for rapid application development.
- A well-defined and thoughtfully written user-defined function can ease the application development process.

### Writing user-defined functions in Python
These are the basic steps in writing user-defined functions in Python. For additional functionalities, we can incorporate more steps as needed
- Step 1: Declare the function with the keyword `def` followed by the function name
- Step 2: Write the arguments inside the `opening ` and `closing parentheses` of the function, and end the declaration with a `colon :`
- Step 3: Add the `program statements` to be executed
- Step 4: End the function with/without `return` statement

#### The example below is a typical syntax for defining functions:
**In Python**:
```python
def FuncName (arg1, arg2...):
    
        program statement1
        program statement2
        return result
```   
**In R**:(for comparison purpose)

  FuncName<-**function**(arg1, arg2...) {
```markdown
    program statement1
    program statement2
    ....
   return
}
```

#### Note: 

#### 1. The general (although simplified) logic of function is this:

```python
def function_name(arg1, arg2):
        do something1...
        do something2...
        return result
```

Functions take **arguments**, also called **parameters**, and return results. Programmers also use the terminology **inputs** and **outputs**.

#### 2. Lifetime of variables within functions, aka _scope_
Any variable you create inside functions disappears as soon as the function 'exits' or finishes running.


### Let’s try some simple code examples

In [82]:
def func(argu1):
    print(argu1)

In [83]:
func('Hello')

Hello


In [84]:
func("Python for Analytics")

Python for Analytics


In [85]:
def func(name):
    print('Hello '+ name)

In [86]:
func('Mei')

Hello Mei


In [87]:
def func(name='November'):
    print('Hello '+ name)

In [88]:
func()

Hello November


In [89]:
# what is the object & not to execute the function
func

<function __main__.func(name='November')>

In [90]:
func(name='Mary')

Hello Mary


In [91]:
func('Mary')

Hello Mary


In [92]:
# Exercise: add one or more variables

def add_numbers(x, y, z):
    return x+y+z

print(add_numbers(1, 2, 3))

6


In [93]:
# Exercise: multiply 3 variables



### Functions that return a value use the return keyword:

In [94]:
def square (num):
     return num**2

In [95]:
output=square(4)
output

16

In [96]:
def square (num):
    '''
    THIS IS A DOCSTRING.
    CAN DO MULTIPLE LINES.
    THIS FUNCTION SQUARES A NUMBER.
    '''
    return num**2

In [97]:
output=square(2)
output

4

In [98]:
output # Shift + Tab to check document

4

In [99]:
def time2(var):
    return var*2

In [100]:
time2(5)

10

### We can return multiple values from a function using tuples:

In [101]:
def powers(x):
    """
    Return a few powers of x.
    """
    return x ** 2, x ** 3, x ** 4 

In [102]:
powers(4)

(16, 64, 256)

In [103]:
x2, x3, x4 = powers(3)
print(x3)

27


In [104]:
seq=[1,2,3,4,5]
map(time2,seq)

<map at 0x20bd8b1b6a0>

In [105]:
list(map(time2,seq))

[2, 4, 6, 8, 10]

In [106]:
seq=[1,2,3,4,5]
map(powers,seq)

<map at 0x20bd8b1b970>

In [107]:
list(map(powers,seq))

[(1, 1, 1), (4, 8, 16), (9, 27, 81), (16, 64, 256), (25, 125, 625)]

In [108]:
# Example: This function calculates the average of two numbers
def average(num1, num2):
    result = (num1 + num2) / 2
    print(result)
    #return result 
  #can we reduce this to a one line function?

In [109]:
a=average(9,2) # should equal 5.5
a

5.5


In [110]:
# Example: 
def first_word(word):
    multiple_words = word.split(" ")
    return multiple_words[0]

In [111]:
first_word("Hi Zoey and Sandy") # should equal "hello"

'Hi'

In [112]:
test_word = "hello world"
first_word(test_word) # same as before

'hello'

In [113]:
first_word("hello") # there is no space to split on, what will this do?

'hello'

In [114]:
# Example:
my_first_name = first_word("Mei Najim Homer Simpson")
print("My first name is",my_first_name)

My first name is Mei


In [115]:
x = 3
y = 2
x + y

5

In [116]:
def add_numbers(x, y):
    return x + y

In [117]:
add_numbers(4, 7)

11

In [118]:
#add_numbers updated to take third parameter
def add_numbers(x,y,z=None):
    if (z==None):
        return x+y
    else:
        return x+y+z

#print(add_numbers(1, 2))
#print(add_numbers(1, 2, 3))

In [119]:
print(add_numbers(1, 2))

3


In [120]:
print(add_numbers(1, 2, 3))

6


In [121]:
#assign function to another name
a=add_numbers

In [122]:
a(16,3)

19

Example: Create a function called times, which returns the product of its two arguments

In [123]:
def times(x,y):     # Create and assign function
    return x*y      # Body executed when called

In [124]:
times(2,4)

8

In [125]:
x=times(3.14,4)    # Save the result object
x

12.56

In [126]:
times('Ni', 4)

'NiNiNiNi'

In [127]:
# Exercise: x*y/z


Exercise: Create a function called division, which returns the division of two arguments

In [128]:
# Try it here
def division (x,y,z,w):
    return x/y*z*w

In [129]:
division(2,4,2,1)

1.0

In [130]:
# Exercise: Create a function called temperature conversion to convert fahrenheit to Celeius



In [132]:
# Create a function to provide data exploratory summary
def df_summary(df,head_size = 5, show_info = True):
    '''print df summary statistics to screen'''
    display(df.shape)
    display(df.head(head_size))
    display(df.corr())
    if show_info:
        display(df.info())

In [133]:
%pwd

'C:\\Users\\yumei\\CSP Workshop 2023'

In [135]:
import pandas as pd
df=pd.read_csv('C:\\Users\\yumei\\CSP Workshop 2023\\Data\\Wine.csv')
df_summary(df)

(178, 14)

Unnamed: 0,Alcohol,Malic_Acid,Ash,Ash_Alcanity,Magnesium,Total_Phenols,Flavanoids,Nonflavanoid_Phenols,Proanthocyanins,Color_Intensity,Hue,OD280,Proline,Customer_Segment
0,14.23,1.71,2.43,15.6,127,2.8,3.06,0.28,2.29,5.64,1.04,3.92,1065,1
1,13.2,1.78,2.14,11.2,100,2.65,2.76,0.26,1.28,4.38,1.05,3.4,1050,1
2,13.16,2.36,2.67,18.6,101,2.8,3.24,0.3,2.81,5.68,1.03,3.17,1185,1
3,14.37,1.95,2.5,16.8,113,3.85,3.49,0.24,2.18,7.8,0.86,3.45,1480,1
4,13.24,2.59,2.87,21.0,118,2.8,2.69,0.39,1.82,4.32,1.04,2.93,735,1


Unnamed: 0,Alcohol,Malic_Acid,Ash,Ash_Alcanity,Magnesium,Total_Phenols,Flavanoids,Nonflavanoid_Phenols,Proanthocyanins,Color_Intensity,Hue,OD280,Proline,Customer_Segment
Alcohol,1.0,0.094397,0.211545,-0.310235,0.270798,0.289101,0.236815,-0.155929,0.136698,0.546364,-0.071747,0.072343,0.64372,-0.328222
Malic_Acid,0.094397,1.0,0.164045,0.2885,-0.054575,-0.335167,-0.411007,0.292977,-0.220746,0.248985,-0.561296,-0.36871,-0.192011,0.437776
Ash,0.211545,0.164045,1.0,0.443367,0.286587,0.12898,0.115077,0.18623,0.009652,0.258887,-0.074667,0.003911,0.223626,-0.049643
Ash_Alcanity,-0.310235,0.2885,0.443367,1.0,-0.083333,-0.321113,-0.35137,0.361922,-0.197327,0.018732,-0.273955,-0.276769,-0.440597,0.517859
Magnesium,0.270798,-0.054575,0.286587,-0.083333,1.0,0.214401,0.195784,-0.256294,0.236441,0.19995,0.055398,0.066004,0.393351,-0.209179
Total_Phenols,0.289101,-0.335167,0.12898,-0.321113,0.214401,1.0,0.864564,-0.449935,0.612413,-0.055136,0.433681,0.699949,0.498115,-0.719163
Flavanoids,0.236815,-0.411007,0.115077,-0.35137,0.195784,0.864564,1.0,-0.5379,0.652692,-0.172379,0.543479,0.787194,0.494193,-0.847498
Nonflavanoid_Phenols,-0.155929,0.292977,0.18623,0.361922,-0.256294,-0.449935,-0.5379,1.0,-0.365845,0.139057,-0.26264,-0.50327,-0.311385,0.489109
Proanthocyanins,0.136698,-0.220746,0.009652,-0.197327,0.236441,0.612413,0.652692,-0.365845,1.0,-0.02525,0.295544,0.519067,0.330417,-0.49913
Color_Intensity,0.546364,0.248985,0.258887,0.018732,0.19995,-0.055136,-0.172379,0.139057,-0.02525,1.0,-0.521813,-0.428815,0.3161,0.265668


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 178 entries, 0 to 177
Data columns (total 14 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   Alcohol               178 non-null    float64
 1   Malic_Acid            178 non-null    float64
 2   Ash                   178 non-null    float64
 3   Ash_Alcanity          178 non-null    float64
 4   Magnesium             178 non-null    int64  
 5   Total_Phenols         178 non-null    float64
 6   Flavanoids            178 non-null    float64
 7   Nonflavanoid_Phenols  178 non-null    float64
 8   Proanthocyanins       178 non-null    float64
 9   Color_Intensity       178 non-null    float64
 10  Hue                   178 non-null    float64
 11  OD280                 178 non-null    float64
 12  Proline               178 non-null    int64  
 13  Customer_Segment      178 non-null    int64  
dtypes: float64(11), int64(3)
memory usage: 19.6 KB


None

### Default argument and keyword arguments
In a definition of a function, we can give default values to the arguments the function takes:

In [136]:
def myfunc(x, p=2, debug=False):
    if debug:
        print("evaluating myfunc for x = " + str(x) + " using exponent p = " + str(p))
    return x**p

If we don't provide a value of the debug argument when calling the the function myfunc it defaults to the value provided in the function definition:

In [137]:
myfunc(5)

25

In [138]:
myfunc(5, debug=True)

evaluating myfunc for x = 5 using exponent p = 2


25


If we explicitly list the name of the arguments in the function calls, they do not need to come in the same order as in the function definition. This is called keyword arguments, and is often very useful in functions that takes a lot of optional arguments.

In [139]:
myfunc(p=3, debug=True, x=7)

evaluating myfunc for x = 7 using exponent p = 3


343

### Unnamed functions (lambda functions): The keyword is `lambda`, followed by one or more arguments (exactly like the arguments list you enclose in parentheses in a def header), followed by an expression after a colon:

`lambda` argument1, argument2, ..., argumentN `:` expression using arguments

- In Python, small anonymous (unnamed) functions can be created with the `lambda` keyword. Lambda forms can be used as **an argument** to other functions where function objects are required but syntactically they are restricted to a single expression.

- **lambda is an expression, not a statement**. Because of this, a `lambda` can appear in places a def is not allowed by Python's syntax - `inside a list or a function call's arguments`, for example. With def, functions can be referenced by name but must be created elsewhere. An expression, lambda returns a value (a new function) that can optionally be assigned a name. In contrast, the **def** statement always assigns the new function to the name in the header, instead of returning it as a result.

- lambda's body is a **single expression**, not a block of statements so **lambda is designed for coding simple functions, and def handles larger tasks**.

### lambda vs. function

In [140]:
def f2(x): 
    return x**2

In [141]:
f2(4)

16

In [142]:
lambda x: x**2

<function __main__.<lambda>(x)>

In [143]:
f1 = lambda x: x**2
    
# is equivalent to 

def f2(x):return x**2

In [144]:
f1(3), f2(3)

(9, 9)

#### Example:

In [145]:
def func(x,y,z): 
    return x+y+z

In [146]:
func(2,3,4)

9

In [147]:
f=lambda x,y,z:x+y+z
f(2,3,4)

9

#### Exercise:

In [148]:
def time2(var): return var*2

In [149]:
# Rewrite the above function as a lambda function:
t=lambda var:var*2

In [150]:
t(6)

12

In [151]:
time2(6)

12

#### This technique is useful for example when we want to pass a simple function as an argument to another function, like this:

In [152]:
lambda num: num*3

<function __main__.<lambda>(num)>

In [153]:
map(lambda num: num*3, seq)

<map at 0x20bd8b1b8b0>

In [154]:
seq

[1, 2, 3, 4, 5]

In [155]:
list(map(lambda num: num*3, seq))

[3, 6, 9, 12, 15]

In [156]:
filter(lambda num: num%2 == 0, seq)

<filter at 0x20bd8b1bd60>

In [157]:
list(filter(lambda num: num%2 == 0, seq))

[2, 4]

In [158]:
# Exercise: Create a lambda function

def times(x,y,z):     
    return x*y/z  

In [159]:
# Put your lambda function here
seq=[1,2,3,4,5]
lambda x,y,z:x*y/z

<function __main__.<lambda>(x, y, z)>

In [160]:
# Use map() to apply the lambda function to seq


In [161]:
people = ['Dr. Christopher Brooks', 'Dr. Kevyn Collins-Thompson', 'Dr. VG Vinod Vydiswaran', 'Dr. Daniel Romero']

def split_title_and_name(person):
    title = person.split()[0]
    lastname = person.split()[-1]
    return '{} {}'.format(title, lastname)
list(map(split_title_and_name, people))

['Dr. Brooks', 'Dr. Collins-Thompson', 'Dr. Vydiswaran', 'Dr. Romero']

In [162]:
people = 'Dr. Christopher Brooks'
people.split()[0]   # only string can use split

'Dr.'

In [163]:
people = ['Dr. Christopher Brooks', 'Dr. Kevyn Collins-Thompson', 'Dr. VG Vinod Vydiswaran', 'Dr. Daniel Romero']

def split_title_and_name(person):
    return person.split()[0] + ' ' + person.split()[-1]

#option 1
for person in people:
    print(split_title_and_name(person) == (lambda x: x.split()[0] + ' ' + x.split()[-1])(person))

#option 2
list(map(split_title_and_name, people)) == list(map(lambda person: person.split()[0] + ' ' + person.split()[-1], people))

True
True
True
True


True

In [164]:
person.split()[0] + ' ' + person.split()[-1]

'Dr. Romero'

#### Note: The course materials are developed mainly based on personal experience and contributions from the Python learning community
Referred Books: 
- Learning Python, 5th Edition by Mark Lutz
- Python Data Science Handbook, Jake, VanderPlas
- Python for Data Analysis, Wes McKinney 

Copyright ©2023 Mei Najim. All rights reserved. 