Primitive Data Types
--------------------

These are the basic data types that constitute all of the more complex data structures in python. The basic data types are the following:

* Strings (for text)
* Numeric types (integers and decimals)
* Booleans


### String:

String variables are used to store textual data, characters and sequences of characters. Can be specified by surrounding some text with single `'` or double `"` quotes. 

In [3]:
str_1 = "Hello World!"
print(str_1)

str_1 = 'Hello World!'
print(str_1)

Hello World!
Hello World!


In [4]:
# Notice that we end the strings below with the \n character, 
# which is the "new line" special character
str_2 = "Hello World!\n\n\nHello World Twice!"
print(str_2)

Hello World!


Hello World Twice!


In [5]:
# Let's use the \t character which is the special character for tab
str_3 ="Hello\tWorld!\tWe\tare\tfar\taway\n"
print(str_3)

Hello	World!	We	are	far	away



In [6]:
print("I want to print backslash: \\")

I want to print backslash: \


In [7]:
str_4 = 'This is a string within single quotes that can contain "double quotes" as part of the string\n'
print(str_4)

This is a string within single quotes that can contain "double quotes" as part of the string



In [8]:
str_5 = 'If we want to have \'single quotes\' in single quoted string we should escape them\n'
print(str_5)

If we want to have 'single quotes' in single quoted string we should escape them



In [9]:
str_6 = "Similarly, if we want to have \"double quotes\" in double quoted string we should escape them\n"
print(str_6)

Similarly, if we want to have "double quotes" in double quoted string we should escape them



In [10]:
str_7 = "hello"
str_8 = "world"
hello_world_message = str_7 + " " + str_8 + "!" 
print(hello_world_message) # note that + concatenates strings

hello world!


In [11]:
str_9 = '''
If we want to have multiple lines in the string
then we can use triple quotes: This is a multiline
string!
'''
print(str_9)


If we want to have multiple lines in the string
then we can use triple quotes: This is a multiline
string!



In [12]:
# Triple quotes are useful for multiline pieces of text (e.g., newspaper articles)
str_10 = """
(CNN)AirAsia Flight QZ8501 climbed rapidly before it crashed, a top Indonesian official said Tuesday, according to The Jakarta Post.

Then the plane stalled, Transportation Minister Ignasius Jonan said at a parliamentary hearing, according to the AFP and Reuters news agencies.

"The plane, during the last minutes, went up faster than normal speed ... after then, it stalled. That is according to the data from the radar," Jonan said, according to the news agencies.
"""
print(str_10)


(CNN)AirAsia Flight QZ8501 climbed rapidly before it crashed, a top Indonesian official said Tuesday, according to The Jakarta Post.

Then the plane stalled, Transportation Minister Ignasius Jonan said at a parliamentary hearing, according to the AFP and Reuters news agencies.

"The plane, during the last minutes, went up faster than normal speed ... after then, it stalled. That is according to the data from the radar," Jonan said, according to the news agencies.



#### "Raw" strings

Prefix strings with `r` to indicate a `raw` string, where there are no escape characters like \t, \n etc. These will be handy when entering regular expressions.

In [13]:
print("e.g., type C:\teaching\ instead of C:\\teaching\\)")

e.g., type C:	eaching\ instead of C:\teaching\)


In [14]:
print(r"e.g., type C:\teaching\ instead of C:\\teaching\\)")

e.g., type C:\teaching\ instead of C:\\teaching\\)


### Acessing parts of the string

**Note: The following instructions will be re-used later for other data structures (e.g., lists), so pay attention!**

Strings can be indexed (subscripted), with the first character having index 0. There is no separate character type; a character is simply a string of size one:

In [15]:
word = 'Python'

In [16]:
word[0]  # character in position 0

'P'

In [17]:
word[1]

'y'

In [18]:
word[5]  # character in position 5

'n'

Indices may also be negative numbers, to start counting from the right:

In [19]:
word[-1]  # last character

'n'

In [20]:
word[-2]  # second-last character

'o'

In [21]:
word[-6]

'P'

In addition to indexing, slicing is also supported. While indexing is used to obtain individual characters, slicing allows you to obtain a substring:

In [22]:
word[0:2]  # characters from position 0 (included) to 2 (excluded)

'Py'

In [23]:
word[2:5]  # characters from position 2 (included) to 5 (excluded)

'tho'

In [24]:
word[2:]  # characters from position 2 (included) to the end

'thon'

In [25]:
word[:3]  # characters from beginning (position 0) to position 3 (excluded)

'Pyt'

In [26]:
word[-3:] # last three characters

'hon'

In [28]:
word[:-3] # last three characters

'Pyt'

In [27]:
word[-3:-1] # penultimate two charactrs

'ho'

#### Exercise

* Assign the string 'Data Care Feedeng & Cleaning' to a Python variable. 
* print(the word 'Data' by using the indexing/slicing approach.)
* print(the word 'Cleaning' by using the negative indexing/slicing approach.)

In [31]:
# your own code here
var_data = 'Data Care Feedeng & Cleaning'
print(var_data[0:4])
print(var_data[-8:])

Data
Cleaning


### Operations on Strings 

We've already seen one of the most common string operators, `+`, used for string concatenation, the indexing operation to get specific characters, and the slicing operation to get substrings. 

Below are some of the more commonly used string operations:

+ `+` : concatenate two strings
+ `len(str)`: length of a string, number of characters
+ `str.upper()`: returns an uppercase version of a string
+ `str.lower()`: returns a lowercase version of a string
+ `haystack.find(needle)`: searches haystack for needle, prints the position of the first occurrence, indexed from 0. Returns -1 if not found
+ `str_1.count(str_2)`: counts the number of occurrences of one string in another.
+ `haystack.startswith(needle)`: does a the haystack string start with the needle string?
+ `haystack.endswith(needle)`: does a the haystack string end with the needle string?
+ `str_1.split(str_2)`: split the first string at every occurrence of the second string. Outputs a list (see below).
+ `==`: are the two operand strings the same?
+ `str.strip()`: remove any whitespace from the left or right of the string, including newlines. 

A better list of string operations is [available here](http://docs.python.org/2/library/string.html).

In [32]:
word = "Python is the word. And on and on and on and on..." 
print(len(word))

50


In [38]:
print("The length of the word above is "+ str(len(word))+ " characters")

The length of the word above is 50 characters


In [34]:
print(word.lower())

python is the word. and on and on and on and on...


In [35]:
print(word.upper())

PYTHON IS THE WORD. AND ON AND ON AND ON AND ON...


In [39]:
word = "Python is the word. And on and on and on and on..." 
ind = word.find("on")
print(ind)

4


In [40]:
print("The first time that we see the string on is at position", word.find("on"))

The first time that we see the string on is at position 4


In [41]:
first_appearance = word.find("on")
second_appearance = word.find("on",first_appearance+1)
print("The second time that we see the string on is at position", second_appearance)

The second time that we see the string on is at position 24


In [42]:
# Looking for the string "on" at the second half of the big string called "word"
midpoint = int(len(word)/2) # finds the middle of the string word
second_half_appearance = word.find("on",midpoint)
print("First time that we see 'on' in the second half: ", second_half_appearance)

First time that we see 'on' in the second half:  31


In [47]:
word = "Python is the word. And on and on and on and on..."
lookfor = "Python"
count = word.count(lookfor)
print( "We see the string '", lookfor  ,"' that many times: ",  count)

We see the string ' Python ' that many times:  1


In [48]:
str_1 = "Hello"
str_2 = "World"
print("concatenation:")
print(str_1 + " " + str_2)
print(str_1 + " everybody")

concatenation:
Hello World
Hello everybody


In [49]:
print("length:")
print(len(str_1))
print(len(str_1 + " " + str_2))

length:
5
11


In [50]:
print("string casing:")
print(str_1.upper())
print("HELLO".lower())

string casing:
HELLO
hello


In [51]:
print("string indexing:")
print("hello".find("ll"))
print("hello".upper().find("LL"))

string indexing:
2
2


In [52]:
print("string count:")
print(str_1.count("l"))
print(str_1.count("ll"))

string count:
2
1


In [53]:
print("starts with & endswith:")
print("hello".startswith("he"))
print("hello".endswith("world"))

starts with & endswith:
True
False


In [54]:
print("split:")
print("practical data science".split(" "))
print("hello".split(" "))
print("practical data science".split("a"))

split:
['practical', 'data', 'science']
['hello']
['pr', 'ctic', 'l d', 't', ' science']


In [55]:
str_1 = "hello"
print("equality:")
print(str_1 == "hello")

print(str_1 == "Hello")

equality:
True
False


In [56]:
mystring1 = "practical data science"
mylist1 = mystring1.split(" ")
print(mystring1)
print(mylist1)

practical data science
['practical', 'data', 'science']


### String Formatting

Often one wants to embed other information into strings, sometimes with special formatting constraints. In python, one may insert special formatting characters into strings that convey what type of data should be inserted and where, and how the "stringified" form should be formatted. For instance:

In [57]:
print('Coordinates: {0}, {1}'.format('37.24N', '115.81W'))

Coordinates: 37.24N, 115.81W


We can of course re-order and use multiple times the placeholders:

In [58]:
print('Latitude: {0}, Longitude: {1} ==> [{0}, {1}]'.format('37.24N', '115.81W'))

Latitude: 37.24N, Longitude: 115.81W ==> [37.24N, 115.81W]


Alternatively, instead of using the {0}, {1}, etc. format, we can specify names for the attributes:

In [59]:
print('Coordinates: {latitude}, {longitude}'.format(latitude='37.24N', longitude='115.81W'))

Coordinates: 37.24N, 115.81W


And we can pass dictionaries (we will examine dictionaries later) with the variable names

In [60]:
coord = {'latitude': '37.24N', 'longitude': '-115.81W'}
print('Coordinates: {latitude}, {longitude}'.format(**coord))

Coordinates: 37.24N, -115.81W


#### More formatting options

As seen above, we can put a variable either using the {number} or the {name} format. Below we will see a few more options, mainly for formatting numbers. (For a more detailed treatment on string formatting options, [see here](https://docs.python.org/3.5/library/string.html#format-string-syntax).) We can achieve the formatting by adding after the number/name the character `:` follows by a set of formatting options.

```
field       ::=  "{" field_name [":" format_spec] "}"
format_spec ::=  [[fill character]align][width][,][.precision][type]
align       ::=  "<" | ">" | "=" | "^"
sign        ::=  "+" | "-" | " "
width       ::=  number of digita in total (if width has a 0 in front, we add  for zero-padding)
precision   ::=  number of decimal points
```

Some common `type`s: 

* `d` integer
* `f` floating point
* `%` percent
* `e` exponential format
* `c` character
* `s` string 


In [61]:
# Keep six digits for the whole number, out of which 3 for the decimals
print("Result: |{num:6.3f}|".format(num=100.0/23))

Result: | 4.348|


In [62]:
# Keep six digits for the whole number, out of which 3 for the decimals, with zero padding in front
print("Result: |{num:06.3f}|".format(num=100.0/23))

Result: |04.348|


In [63]:
# Floating point with three decimal digits
print("Result: |{num:.3f}|".format(num=100.0/23))

Result: |4.348|


In [64]:
# Sixteen digits total and four decimal digits, with comma-separated thousands
print("Result: |{num:16,.4f}|".format(num=1000000.0/7))
print("Result: |{num:16,.4f}|".format(num=100.0/7))

Result: |    142,857.1429|
Result: |         14.2857|


In [65]:
# Expressing a percentage:
points = 19
total = 22
print('Correct answers: {:.2%}'.format(points/total))

Correct answers: 86.36%


In [66]:
# alignment
print('|{message:<30}|'.format(message='left aligned'))
print('|{message:>30}|'.format(message='right aligned'))
print('|{message:^30}|'.format(message='centered'))


|left aligned                  |
|                 right aligned|
|           centered           |


In [67]:
# fill
print('|{message:*<80}|'.format(message='left aligned with # chars as fill'))
print('|{message:#>80}|'.format(message='right aligned with # chars as fill'))
print('|{message:#^80}|'.format(message='centered with # chars as fill'))

|left aligned with # chars as fill***********************************************|
|##############################################right aligned with # chars as fill|
|#########################centered with # chars as fill##########################|


#### Exercise

* Save your name in a variable
* Save the name of our course in a variable
* Print the sentence saying that you are taking this course using .format syntax

In [71]:
# your own code here
name = 'Konstantin Bauman'
cname = 'Data Care Feeding & Cleaning'
print('{instructor} teaches the {course} course.'.format(instructor=name, course=cname))

Konstantin Bauman teaches the Data Care Feeding & Cleaning course.
