# Data Science Day 2

## Built-In Types: Simple Values

- Type: int, e.g., x = 1 (integers i.e., whole numbers)
- Type: float, e.g., x = 1.0 (floating-point numbers i.e., real numbers)
- Type: complex, e.g., x = 1 + 2j (complex numbers i.e., numbers with real and imaginary part)
- Type: bool, e.g., x = True (Boolean: True/False values)
- Type: str, e.g., x = 'abc' (string: characters or text)
- Type: NoneType, e.g., x = None (special object indicating nulls)

### Integers

In [4]:
x = 1

### Floating-Point Numbers

In [1]:
x = 0.000005
y = 5e-6
print(x == y)

True


In [2]:
float(1)

1.0

- Floating-point precision is limited, which can cause equality tests to be unstable

In [3]:
0.1 + 0.2 == 0.3

False

- This is due to rounding errors as floating-point arithmetic is approximate

In [5]:
print("0.1 = {0:0.17f}".format(0.1))
print("0.2 = {0:0.17f}".format(0.2))
print("0.3 = {0:0.17f}".format(0.3))

0.1 = 0.10000000000000001
0.2 = 0.20000000000000001
0.3 = 0.29999999999999999


### Complex Numbers

In [6]:
complex(1, 2)

(1+2j)

In [7]:
1 + 2j

(1+2j)

In [8]:
c = 3 + 4j

In [9]:
c.real #real part

3.0

In [10]:
c.imag #imaginary part

4.0

In [11]:
c.conjugate() #complex conjugate

(3-4j)

In [12]:
abs(c) #magnitude, i.e., sqrt(c.real ** 2 + c.imag ** 2)

5.0

### String Type

In [13]:
message = "what do you like?"
response = 'spam'

In [14]:
#length of string
len(response)

4

In [15]:
#make upper-case. see also str.lower()
response.upper()

'SPAM'

In [16]:
#capitalize. see also str.title()
message.capitalize()

'What do you like?'

In [17]:
#concatenation with +
message + response

'what do you like?spam'

In [18]:
#multiplication is multiple concatenation
5 * response

'spamspamspamspamspam'

In [19]:
#access individual characters (zero-based indexing)
message[0]

'w'

### None Type

In [20]:
type(None)

NoneType

In [21]:
return_value = print('abc')

abc


In [22]:
print(return_value)

None


- Any function in Python with no return value is returning None

### Boolean Type

In [23]:
result = (4 < 5)
result

True

In [24]:
type(result)

bool

- Boolean values are case sensitive: True and False must be capitalized!

In [25]:
print(True, False)

True False


- Booleans can also be constructed using the bool() object constructor: e.g., **any numeric type is False if equal to zero and True otherwise**

In [26]:
bool(2014)

True

In [27]:
bool(0)

False

In [28]:
bool(3.1415)

True

In [29]:
bool(None)

False

- For strings, bool(s) is False for empty strings and True otherwise

In [30]:
bool("")

False

In [31]:
bool("abc")

True

### Conversion between Datatypes

- Convert between different datatypes using type conversion functions, e.g., int(), float(), str(), etc.

In [32]:
float(5) #convert integer to float using float() method

5.0

In [33]:
int(100.5) #convert float to integer using int() method

100

In [34]:
str(20) #convert integer to string

'20'

- Conversion to and from string must contain compatible values

In [35]:
int('10p')

ValueError: invalid literal for int() with base 10: '10p'

## Built-In Data Structures

- Type: list, e.g., [1, 2, 3] (ordered colletion)
- Type: tuple, e.g., (1, 2, 3) (immutable ordered colletion)
- Type: dict, e.g., {'a':1, 'b':2, 'c':3} (unordered (key, value) mapping)
- Type: set, e.g., {1, 2, 3} (unordered collection of unique values)

### Lists

- The basic ordered and mutable data collection type in Python
- Defined with comma-separated values between square brackets

In [41]:
L = [2, 3, 5, 7]

In [37]:
#length of a list
len(L)

4

In [42]:
#append a value to the end
L.append(11)
L

[2, 3, 5, 7, 11]

In [43]:
#addition concatenates lists
L + [13, 17, 19]

[2, 3, 5, 7, 11, 13, 17, 19]

In [44]:
#sort() method sorts in-place
L = [2, 5, 1, 6, 3, 4]
L.sort()
L

[1, 2, 3, 4, 5, 6]

- Can contain objects of any type, or even a mix of types

In [45]:
L = [1, 'two', 3.14, [0, 3, 5]]

#### List Indexing and Slicing

- Access to elements in compound types can be done through indexing for single elements, and slicing for multiple elements
- Both are indicated by a square-bracket syntax

In [58]:
L = [2, 3, 5, 7, 11]

- Python uses zero-based indexing

In [47]:
L[0]

2

In [48]:
L[1]

3

- Elements at the end of the list can be accessed with negative numbers, starting from -1

In [49]:
L[-1]

11

In [50]:
L[-2]

7

- Slicing uses a colon to indicate the start point (inclusive) and end point (non-inclusive) of the sub-array
- For example, to get the first three elements of the list:

In [51]:
L[0:3]

[2, 3, 5]

In [52]:
L[:3]

[2, 3, 5]

- If we leave out the last index, it defaults to the length of the list
- For example, the last three elements can be accessed:

In [59]:
L[-3:]

[5, 7, 11]

- It is possible to specify a third integer that represents the step size
- For example, to select every second element of the list:

In [53]:
L[::2] #equivalent to L[0:len(L):2]

[2, 5, 11]

- It is useful to specify a negative step, which will reverse the array

In [54]:
L[::-1]

[11, 7, 5, 3, 2]

- Indexing and slicing can be used to set elements as well

In [55]:
L[0] = 100
print(L)

[100, 3, 5, 7, 11]


In [56]:
L[1:3] = [55, 56]
print(L)

[100, 55, 56, 7, 11]


### Tuples

- Defined with parentheses

In [60]:
t = (1, 2, 3)

- Can also be defined without any brackets at all

In [61]:
t = 1, 2, 3
print(t)

(1, 2, 3)


- Tuples have a length, and individual elements can be extracted using square-bracket indexing

In [62]:
len(t)

3

In [63]:
t[0]

1

- Tuples are immutable: once they are created, their size and shape cannot be changed

In [64]:
t[1] = 4

TypeError: 'tuple' object does not support item assignment

In [65]:
t.append(4)

AttributeError: 'tuple' object has no attribute 'append'

- Tuples are used in functions that have multiple return values
- For example, the as_integer_ratio() method of floating-point objects returns a numerator and a denominator

In [66]:
x = 0.125
x.as_integer_ratio()

(1, 8)

- These multiple return values can be individually assigned:

In [67]:
numerator, denominator = x.as_integer_ratio()
print(numerator / denominator)

0.125


- Indexing and slicing works for tuples as well

### Dictionaries

- Extremely flexible mappings of keys to values
- Created via a comma-separated list of key:value pairs within curly braces

In [68]:
numbers = {'one':1, 'two':2, 'three':3}

- Items are accessed and set via the indexing syntax, except the index is not a zero-based order but valid key in the dictionary

In [69]:
#access a value via the key
numbers['two']

2

- New items can be added to the dictionary using indexing

In [70]:
#set a new key:value pair
numbers['ninety'] = 90
print(numbers)

{'one': 1, 'two': 2, 'three': 3, 'ninety': 90}


- Dictionaries do not maintain any sense of order for the input parameters

### Sets

- Contains unordered collections of unique items
- Use curly brackets of dictionaries

In [71]:
primes = {2, 3, 5, 7}
odds = {1, 3, 5, 7, 9}

In [72]:
#union: items appearing in either
primes | odds #with an operator
primes.union(odds) #with a method

{1, 2, 3, 5, 7, 9}

In [73]:
#intersection: items appearing in both
primes & odds #with an operator
primes.intersection(odds) #with a method

{3, 5, 7}

In [74]:
#difference: items in primes but not in odds
primes - odds #with an operator
primes.difference(odds) #with a method

{2}

In [75]:
#symmetric difference: items appearing only in one set
primes ^ odds #with an operator
primes.symmetric_difference(odds) #with a method

{1, 2, 9}

### More Specialized Data Structures

- Can be found in the collections module
- collections.namedtuple: like a tuple, but each value has a name
- collections.defaultdict: like a dictionary, but unspecified keys have a user-specified default value
- collections.OrderedDict: like a dictionary, but the order of keys is maintained