<a href="https://colab.research.google.com/github/pablocurcodev/machine_learning/blob/main/Pandas.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Pandas**

The pandas library provides high-performance, easy-to-use data structures and data analysis tools. The main data structure is the DataFrame, which you can think of as an in-memory 2D table (like a spreadsheet, with column names and row labels). Many features available in Excel are available programmatically, such as creating pivot tables, computing columns based on other columns, plotting graphs, etc. You can also group rows by column value, or join tables much like in SQL. Pandas is also great at handling time series.

pandas provides high-level data structures and functions designed to make working with structured or tabular data intuitive and flexible.
https://learning.oreilly.com/library/view/python-for-data/9781098104023/ch01.html#essential_pandas


https://pandas.pydata.org/docs/user_guide/index.html#userguide

#**Python Language Basics**

In [1]:
# Introspection
# Using a question mark (?) before or after a variable will display some general information about the object:

b = [1, 2, 3]
b?

In [2]:
def add_numbers(a, b):
    """
    Add two numbers together

    Returns
    -------
    the_sum : type of arguments
    """
    return a + b

In [3]:
add_numbers?

In [4]:
"""
? has a final usage, which is for searching the python namespace
A number of characters combined with the wildcard (*) will show all names matching the wildcard expression.
"""

import numpy as np
np.*load*?

## **Language Semantics**

The Python language design is distinguished by its emphasis on readability, simplicity, and explicitness. Some people go so far as to liken it to “executable pseudocode.”

**Indentation, not braces**
Python uses whitespace (tabs or spaces) to structure code instead of using braces as in many other languages like R, C++, Java, and Perl. Consider a for loop from a sorting algorithm:

In [5]:
array = [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15]
less, greater = [], []
pivot = 5
for x in array:
    if x < pivot:
        less.append(x)
    else:
        greater.append(x)

less, greater

([1, 2, 3, 4], [5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15])

A colon denotes the start of an indented code block after which all of the code must be indented by the same amount until the end of the block.

I strongly recommend using four spaces as your default indentation and replacing tabs with four spaces.

As you can see by now, Python statements also do not need to be terminated by semicolons. Semicolons can be used, however, to separate multiple statements on a single line:

a = 5; b = 6; c = 7

An important characteristic of the Python language is the consistency of its object model. Every number, string, data structure, function, class, module, and so on exists in the Python interpreter in its own “box,” which is referred to as a Python object.

**Comments**
Any text preceded by the hash mark (pound sign) # is ignored by the Python interpreter. This is often used to add comments to code.

You call functions using parentheses and passing zero or more arguments, optionally assigning the returned value to a variable

**Dynamic references, strong types**
Variables in Python have no inherent type associated with them; a variable can refer to a different type of object simply by doing an assignment.

In [6]:
a = 5

print(type(a))

a = "foo"

print(type(a))

<class 'int'>
<class 'str'>


In [7]:
def isiterable(obj):
    try:
        iter(obj)
        return True
    except TypeError: # not iterable
        return False

print(isiterable(False))
print(isiterable(5))
print(isiterable('String'))
print(isiterable([1,2,3]))

False
False
True
True


To check if two variables refer to the same object, use the is keyword. Use is not to check that two objects are not the same:

In [8]:
a = [1, 2, 3]
b = a
c = list(a)

print(a is b)
print(a is c)

True
False


In [9]:
z = None
print(type(z))
print(z)

<class 'NoneType'>
None


In [10]:
c = """
This is a longer string that
spans multiple lines
"""
print(c)


This is a longer string that
spans multiple lines



In [11]:
# Many Python objects can be converted to a string using the str function:

a = 5.6
s = str(a)
print(s)

5.6


In [12]:
# raw string
s = r"this\has\no\special\characters"
s

'this\\has\\no\\special\\characters'

In [13]:
template = "{0:.2f} {1:s} are worth US${2:d}"
template.format(88.46, "Argentine Pesos", 1)

'88.46 Argentine Pesos are worth US$1'

In [14]:
amount = 10
rate = 88.46
currency = "Pesos"

result = f"{amount} {currency} is worth US${amount / rate}"
result

'10 Pesos is worth US$0.11304544426859599'

In [15]:
val = "español"
val_utf8 = val.encode("utf-8")
print(val_utf8)
print(type(val_utf8))

b'espa\xc3\xb1ol'
<class 'bytes'>


In [16]:
val2 = val_utf8.decode("utf-8")
print(val2)
print(type(val2))

print(val.encode("latin1"))

print(val.encode("utf-16"))

print(val.encode("utf-16le"))

español
<class 'str'>
b'espa\xf1ol'
b'\xff\xfee\x00s\x00p\x00a\x00\xf1\x00o\x00l\x00'
b'e\x00s\x00p\x00a\x00\xf1\x00o\x00l\x00'


## **Dates and times**

In [17]:
from datetime import datetime, date, time

dt = datetime(2011, 10, 29, 20, 30, 21)
print(dt.day)
print(dt.minute)
print(dt.date())
print(dt.time())

print(dt.strftime("%Y-%m-%d %H:%M"))

dt2 = datetime(2011, 11, 15, 22, 30)

delta = dt2 - dt
print(delta)
print(type(delta))

print(dt + delta)


29
30
2011-10-29
20:30:21
2011-10-29 20:30
17 days, 1:59:39
<class 'datetime.timedelta'>
2011-11-15 22:30:00


In [18]:
dt_hour = dt.replace(minute=0, second=0)

## **Control Flow**

Python has several built-in keywords for conditional logic, loops, and other standard control flow concepts found in other programming languages.

### **if, elif, and else**

In [19]:
x = -7
if x < 0:
    print("x is negative")

x is negative


In [20]:
x = 5

if x < 0:
    print("x is negative")
elif x == 0:
    print("x is equals to zero")
elif 0 < x < 5:
    print("x is positive but smaller than 5")
else:
    print("x is positive and larger than or equal to 5")

x is positive and larger than or equal to 5


In [21]:
5 > 3 > 2 > 0

True

In [22]:
sequence = [1, 2, None, 4, None, 5]
total = 0
for value in sequence:
    if value is None:
        continue # skipping the rest of the block
    total += value

print(total)

12


In [23]:
sequence = [1, 2, 0, 4, 6, 5, 2, 1]
total_until_5 = 0
for value in sequence:
    if value == 5:
        break # exiting the block altogether
    total_until_5 += value
print(total_until_5)

# The break keyword only terminates the innermost for loop;
# any outer for loops will continue to run:

for i in range(4):
    for j in range(4):
        if j > i:
            break
        print((i, j))

13
(0, 0)
(1, 0)
(1, 1)
(2, 0)
(2, 1)
(2, 2)
(3, 0)
(3, 1)
(3, 2)
(3, 3)


In [24]:
iterator = [(1,2,3),(4,5,6),(7,8,9)]
for a, b, c in iterator:
    print(a + b + c)

6
15
24


### **while loops**
A while loop specifies a condition and a block of code that is to be executed until the condition evaluates to False or the loop is explicitly ended with break:

In [25]:
x = 256
total = 0
while x > 0:
    if total > 500:
        break
    print("total = " + str(total) + " + " + str(x) + " = " + str(total + x))
    total += x
    print("New value for x = " + "str(x // 2) = " + str(x // 2))
    x = x // 2

total = 0 + 256 = 256
New value for x = str(x // 2) = 128
total = 256 + 128 = 384
New value for x = str(x // 2) = 64
total = 384 + 64 = 448
New value for x = str(x // 2) = 32
total = 448 + 32 = 480
New value for x = str(x // 2) = 16
total = 480 + 16 = 496
New value for x = str(x // 2) = 8
total = 496 + 8 = 504
New value for x = str(x // 2) = 4


### **pass**
pass is the “no-op” (or “do nothing”) statement in Python. It can be used in blocks where no action is to be taken (or as a placeholder for code not yet implemented); it is required only because Python uses whitespace to delimit blocks:

In [26]:
if x < 0:
    print("negative!")
elif x == 0:
    # TODO: put something smart here
    pass
else:
    print("positive!")

positive!


## **range**
The range function generates a sequence of evenly spaced integers:

In [27]:
range(10)
list(range(10))

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

In [28]:
list(range(0, 20, 2))

[0, 2, 4, 6, 8, 10, 12, 14, 16, 18]

In [29]:
# A common use of range is for iterating through sequences by index:
seq = [1, 2, 3, 4]
for i in range(len(seq)):
    print(f"element {i}: {seq[i]}")

element 0: 1
element 1: 2
element 2: 3
element 3: 4


# Data Structures and Sequences

Python’s data structures are simple but powerful. Mastering their use is a critical part of becoming a proficient Python programmer. We start with tuple, list, and dictionary, which are some of the most frequently used sequence types.

## **Tuple**

In [30]:
tup = (4, 5, 6)
print(tup)
print(type(tup))

# also
tup = 4, 5, 6
print(tup)
print(type(tup))

(4, 5, 6)
<class 'tuple'>
(4, 5, 6)
<class 'tuple'>


In [32]:
tup = tuple("string")
print(tup)
print(tup[1])

('s', 't', 'r', 'i', 'n', 'g')
t


In [35]:
nested_tup = (4, 5, 6), (7, 8)
print(nested_tup[0])
print(nested_tup[1][0])

(4, 5, 6)
7


In [38]:
tup = tuple(['foo', [1, 2], True])
tup[1].append(3)
tup

('foo', [1, 2, 3], True)

In [43]:
t = (4, None, 'foo') + (6, 0) + ('bar',)
print(t)
t = (3,5) * 15
print(t)
print(len(t))

(4, None, 'foo', 6, 0, 'bar')
(3, 5, 3, 5, 3, 5, 3, 5, 3, 5, 3, 5, 3, 5, 3, 5, 3, 5, 3, 5, 3, 5, 3, 5, 3, 5, 3, 5, 3, 5)
30


In [45]:
# A common use of variable unpacking is iterating over sequences of tuples or lists:
seq = [(1, 2, 3), (4, 5, 6), (7, 8, 9)]
for a, b, c in seq:
    print(f'a={a}, b={b}, c={c}')

a=1, b=2, c=3
a=4, b=5, c=6
a=7, b=8, c=9


In [46]:
values = 1, 2, 3, 4, 5
a, b, *rest = values
rest

# This rest bit is sometimes something you want to discard;
# there is nothing special about the rest name.
# As a matter of convention, many Python programmers
# will use the underscore (_) for unwanted variables:
a, b, *_ = values

[3, 4, 5]

In [51]:
a = (1, 2, 2, 2, 3, 4, 2)
a.count(2)

4

## **List**

In [None]:
https://learning.oreilly.com/library/view/python-for-data/9781098104023/ch03.html