# **INTRODUCTION TO DATA SCIENCE WITH PYTHON**


## **WEEK 1**
### **Fundamentals of Data Manipulation with Python**

#### **Learning Objectives**

---

- Load, manipulate, and select data using numpy, as well as understand the fundamental data types in the numpy ecosystem
- Show the benefits of vectorization with numpy data
- Apply regular expressions to string data
- Demonstrate, at a high level, how regular expression pattern matching is expressed

##### **Summary**

1. **Python's Readability:** Python is a highly versatile programming language known for its simplicity and readability. Its clean syntax makes it accessible to those without prior programming experience.

2. **Interactive Learning:** Python's interactive nature allows users to write and evaluate code line by line, making it suitable for tasks requiring investigation and experimentation.

3. **Dynamic Typing:** Python is dynamically typed, meaning variable types can change without explicit declarations. This flexibility is especially beneficial in an interactive environment.

4. **Jupyter Notebooks:** Python seamlessly integrates with Jupyter notebooks, an environment that allows code to be divided into cells and executed on demand. It's an ideal tool for both learning and experimentation.

5. **Simplicity:** Python minimizes boilerplate code, enabling learners to perform tasks with minimal setup. Variables can be set and manipulated with ease.

6. **Interactive Interpreter:** Python's interactive mode facilitates immediate code execution and real-time feedback, enhancing the learning process.

7. **Function Basics:** Python's functions are defined using the `def` keyword and utilize indentation for scope definition. They support default parameter values, making them versatile and adaptable to various scenarios.


In [79]:
# Variables

x = 1
y = 2

x + y

3

### **Python Functions**


- **No Return Type Declaration:** In Python, you don't have to specify the return type of a function. Unlike some other languages where you might declare a function to return a specific data type (e.g., int, string), Python is dynamically typed, so you can return any type of data without specifying it beforehand.

- **No Explicit Return Statement:** You don't necessarily have to include a return statement in your Python functions. If you omit the return statement, the function will automatically return a special value called `None`. This is similar to the concept of `null` in Java and indicates the absence of a value being returned.

- **Default Parameter Values:** Python allows you to set default values for function parameters. For example, you can define a function with default parameter values, and if the caller doesn't provide a value for a particular parameter, the default value is used. This feature is useful for creating functions with optional arguments. It eliminates the need to overload functions with different parameter lists.

**Here's an example related to the third point:**

```python
def add_numbers(a, b, c=None):
    if c is None:
        # If c is not provided, use a default value of None
        return a + b
    else:
        # If c is provided, use it
        return a + b + c


In [80]:
def add_numbers(x, y):
    return x + y

add_numbers(1, 2)

3

In [81]:
def add_three_numbers(x, y, z = None):

    if(z == None):
        return x + y
    else:
        return x + y + z


print(add_three_numbers(1, 2))

3


In [82]:
def add_three_numbers(x, y, z = None):

    if(z == None):
        return x + y
    else:
        return x + y + z


print(add_three_numbers(1, 2, 3))

6


This function should add the two values if the value of the "kind" parameter is "add" or is not passed in, otherwise it should subtract the second value from the first. 

Can you fix the function so that it works?


``` python

def do_math(?, ?, ?):
    if (kind=='add'):
        return a+b
    else:
        return a-b

do_math(1, 2)

In [83]:
def do_math(a, b, kind = 'add'):

    if (kind=='add'):
        return a+b
    else:
        return a-b
    
do_math(1, 2)

3

In [84]:
def do_math(a, b, kind = 'add'):

    if (kind=='add'):
        return a+b
    else:
        return a-b
    
do_math(1, 2, 2)

-1

### **Python Types and Sequences**

- **Data Types in Python**: Python may not enforce static typing, but it indeed has data types. The `type` function helps identify the type of a reference. Common data types include strings, NoneType (representing absence), integers, and floating-point variables. Python also supports function types.

- **Collections in Python**: Python's core revolves around various sequence and collection types. Three native collections are discussed: tuples, lists, and dictionaries.

- **Tuples**: Tuples are ordered sequences of variables that are immutable once created. They use parentheses for declaration and can hold a mix of data types.

- **Lists**: Lists are similar to tuples but mutable, allowing for changes in length and element values. Lists are declared using square brackets and can be modified using functions like `append`.

- **Iterating Collections**: Both lists and tuples are iterable types, allowing the use of `for` loops to iterate through their elements. Python doesn't require explicit typing.

- **Indexing and Slicing**: Python enables indexing and slicing of lists, tuples, and strings. Slicing involves specifying a start and an exclusive end position. Negative indices can be used to index from the end. Slicing is crucial for scientific computing and manipulating strings.

- **String Manipulation**: Strings in Python can be manipulated through operations like concatenation, repetition, and searching. The `split` function is used to break strings into substrings based on patterns.

- **Dictionaries**: Dictionaries are labeled collections without a specific order. They use curly braces and consist of key-value pairs. Items can be added, iterated through, and retrieved by key.

- **Unpacking Sequences**: Python supports sequence unpacking, where a sequence (list or tuple) can be assigned to multiple variables in one statement. Unpacking assigns values in order.


#### **Basics**
---

In [85]:
type(None)

NoneType

In [86]:
type(2)

int

In [87]:
type(1.9)

float

In [88]:
type(add_numbers)

function

In [89]:
exampleTuple = (1, 'a', 2, 'b')
type(exampleTuple)

tuple

In [90]:
exampleList = [1, 'a', 2, 'b']
type(exampleList)

list

In [91]:
exampleList.append(3)
print(exampleList)

[1, 'a', 2, 'b', 3]


In [92]:
for item in exampleList:
    print(item)

1
a
2
b
3


In [93]:
i = 0

while( i != len(exampleList)):
    print(exampleList[i])
    i += 1

1
a
2
b
3


In [94]:
numberList = [3, 4, 6, 1, 7, 8, 35, 56, 23, 78, 12, 34, 89, 34, 56, 74, 18, 84, 75]


In [95]:
min(numberList)

1

In [96]:
max(numberList)

89

#### **Operations**
---

In [97]:
dictionaryA = [1, 2, 3, 4]
dictionaryb = [4, 5, 6, 7]

In [98]:
dictionaryA * 3

[1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4]

In [99]:
dictionaryA + dictionaryb

[1, 2, 3, 4, 4, 5, 6, 7]

In [100]:
8 in dictionaryb

False

In [101]:
2 in dictionaryA

True

#### **Slicing**
---

In [102]:
example_string  = 'Hello World!'

print(example_string[0])
print(example_string[0:1])
print(example_string[0:2])


H
H
He


In [103]:
example_string[-1]

'!'

In [104]:
example_string[-4:-2]

'rl'

In [105]:
example_string[3:]

'lo World!'

In [106]:
example_string[:3]

'Hel'

### **Manipulate Strings**
---

- **Regular Expresion Evaluation**: Often referred to as regex or regexp, is a powerful tool for pattern matching and text manipulation. It involves the use of special characters and rules to search, extract, or manipulate strings based on specific patterns. 


#### **Basic tools for texts analysis in Python**

In [107]:
firstName = 'Roberto'
lastLame = 'Hernandez'

fullName = firstName + ' ' + lastLame

fullName

'Roberto Hernandez'

In [108]:
firstName * 3

'RobertoRobertoRoberto'

In [109]:
'Roberto' in firstName

True

In [110]:
firstName = 'Roberto Anibal Hernandez'.split(' ')[0]
print(firstName)

Roberto


In [111]:
lastLame = 'Roberto Anibal Hernandez'.split(' ')[-1]
print(lastLame)

Hernandez


#### **Python Dictionaries**

In [112]:
test_contacts_data = {'John Doe':'john@gmail.com', 'Bill Gates':'bill@gmail.com'}

test_contacts_data['Bill Gates']

'bill@gmail.com'

In [113]:
# Adds 'Cristiano Ronaldo': None

test_contacts_data['Cristiano Ronaldo'] = None
test_contacts_data['Cristiano Ronaldo']


In [114]:
for name in test_contacts_data:
    print(test_contacts_data[name])

john@gmail.com
bill@gmail.com
None


In [115]:
for email in test_contacts_data.values():
    print(email)

john@gmail.com
bill@gmail.com
None


In [116]:
for name, email in test_contacts_data.items():
    print(name)
    print(email)

John Doe
john@gmail.com
Bill Gates
bill@gmail.com
Cristiano Ronaldo
None


In [117]:
test_tuple_data = ('Roberto', 'Hernandez', 'contact@rahz.com')

fname, lname, primary_email = test_tuple_data

fname

'Roberto'