# sorting basics
in python, we just call inbuilt functions
Note that the sort method does not return a sorted version of the list. In fact, it returns the value None. But the list itself has been modified. This kind of operation that works by having a side effect on the list can be quite confusing.
In this course, we will generally use an alternative way of sorting, the function sorted rather than the method sort. Because it is a function rather than a method, it is invoked on a list by passing the list as a parameter inside the parentheses, rather than putting the list before the period. More importantly, sorted does not change the original list. Instead, it returns a new list.

In [2]:
#sort method
L1 = [1, 7, 4, -2, 3]
L2 = ["Cherry", "Apple", "Blueberry"]
L3 = ["Cherry", "Apple", "blueberry"]
L1.sort()
print(L1)
L2.sort()
print(L2)
L3.sort()
print(L3)

[-2, 1, 3, 4, 7]
['Apple', 'Blueberry', 'Cherry']
['Apple', 'Cherry', 'blueberry']


In [4]:
#sorted funtion
L2 = ["Cherry", "Apple", "Blueberry"]

L3 = sorted(L2)
print(L3)
print(sorted(L2))
print(L2) # unchanged

print('-'*10)
print(L2.sort())

['Apple', 'Blueberry', 'Cherry']
['Apple', 'Blueberry', 'Cherry']
['Cherry', 'Apple', 'Blueberry']
----------
None


In [8]:
#Optional reverse parameter

#The sorted function takes some optional parameters 
#The first optional parameter is a key function 
#The second optional parameter is a Boolean value which determines whether to sort the items in reverse order. 
#By default, it is False, but if you set it to True, the list will be sorted in reverse order.

L2 = ["Cherry", "Apple", "Blueberry"]
print(sorted(L2, reverse=True))

#This is a situation where it is convenient to use the keyword mechanism for providing optional parameters. 
#It is possible to provide the value True for the reverse parameter without naming that parameter, but then we would have to provide a value for the second parameter as well
#print(sorted(L2, None, True))     #i have no idea why the error
L1 = [1, 7, 4, -2, 3]
print(sorted(L1, key=None, reverse=True))  #this does work.

['Cherry', 'Blueberry', 'Apple']


In [18]:
#Optional key parameter

#If you want to sort things in some order other than the “natural” or its reverse, you can provide an additional parameter, the key parameter.
L1 = [1, 7, 4, -2, 3]

def absolute(x):
    if x >= 0:
        return x
    else:
        return -x

L2 = sorted(L1, key=absolute)
print(L2)
#yes we can do that, though python has its own fucntion abs (absolute fucntion). ->
L2 = sorted(L1, key=abs)
print(L2)

#default value of this key parameter is None
#Note that this code never explicitly calls the absolute function at all. It passes the absolute function as a parameter value to the sorted function. Inside the sorted function, whose code we haven’t seen, that function gets invoked.

nums = ['1450', '33', '871', '19', '14378', '32', '1005', '44', '8907', '16']
nums_sorted_lambda= sorted(nums, key=lambda s:s[-1],reverse=True)
nums_sorted_lambda

[1, -2, 3, 4, 7]
[1, -2, 3, 4, 7]


['19', '14378', '8907', '16', '1005', '44', '33', '32', '871', '1450']

In [22]:
#sorting a dictionary

#e.g. of frequency counts of chars in a string
stre='hey i had to move on!'
s=stre.split()
d = {}
for word in s:
    for x in word:
        if x in d:
            d[x] = d[x] + 1
        else:
            d[x] = 1
y = sorted(d.keys())
for k in y:
    print("{} appears {} times".format(k, d[k]))

! appears 1 times
a appears 1 times
d appears 1 times
e appears 2 times
h appears 2 times
i appears 1 times
m appears 1 times
n appears 1 times
o appears 3 times
t appears 1 times
v appears 1 times
y appears 1 times


In [25]:
#that was how to sort according to letter, or better keys of the dict, but with values, i.e. highest occurred to lowest
d = {}
stre='hey i had to move on'
s=stre.split()
for L in s:
    for x in L:
        if x in d:
            d[x] = d[x] + 1
        else:
            d[x] = 1

for k in sorted(d, key=lambda k: d[k], reverse=True):                #very confusing line. The function sorted is invoked. Its first parameter value is a dictionary, which really means the keys of the dictionary. The second parameter, the key function, decorates the dictionary key with a post-it note containing that key’s value in dictionary d. The last parameter, True, says to sort in reverse order.
      print("{} appears {} times".format(k, d[k]))
#When we sort the keys, passing a function with key=lambda x: d[x] does not specify to sort the keys of a dictionary, 
#The lists of keys are passed as the first parameter value in the invocation of sort. The key parameter provides a function that says how to sort them.


o appears 3 times
h appears 2 times
e appears 2 times
y appears 1 times
i appears 1 times
a appears 1 times
d appears 1 times
t appears 1 times
m appears 1 times
v appears 1 times
n appears 1 times


In [None]:
#another way to sort dictionaries
#There is another way to sort dictionaries, by calling .items() to extract a sequence of (key, value) tuples, and then sorting that sequence of tuples. But it’s better to learn the pythonic way of doing it, sorting the dictionary keys using a key function that takes one key as input and looks up the value in the dictionary.

In [27]:
dictionary = {"Flowers": 10, 'Trees': 20, 'Chairs': 6, "Firepit": 1, 'Grill': 2, 'Lights': 14}
sorted_values=sorted(dictionary, key=lambda k:dictionary[k],reverse=True)
sorted_values

['Trees', 'Lights', 'Flowers', 'Chairs', 'Grill', 'Firepit']

#### Breaking Ties: Second Sorting
What happens when two items are “tied” in the sort order? For example, suppose we sort a list of words by their lengths. Which five letter word will appear first?

The answer is that the python interpreter will sort the tied items in the same order they were in before the sorting.

What if we wanted to sort them by some other property, say alphabetically, when the words were the same length? Python allows us to specify multiple conditions when we perform a sort by returning a tuple from a key function.

First, let’s see how python sorts tuples. We’ve already seen that there’s a built-in sort order, if we don’t specify any key function. For numbers, it’s lowest to highest. For strings, it’s alphabetic order. For a sequence of tuples, the default sort order is based on the default sort order for the first elements of the tuples, with ties being broken by the second elements, and then third elements if necessary, etc. For example,

In [28]:
tups = [('A', 3, 2),
        ('C', 1, 4),
        ('B', 3, 1),
        ('A', 2, 4),
        ('C', 1, 2)]
for tup in sorted(tups):
    print(tup)

('A', 2, 4)
('A', 3, 2)
('B', 3, 1)
('C', 1, 2)
('C', 1, 4)


In [None]:
#sort a list of fruit words first by their length, smallest to largest, and then alphabetically to break ties among words of the same length.
#To do that, we have the key function return a tuple whose first element is the length of the fruit’s name, and second element is the fruit name itself.

fruits = ['peach', 'kiwi', 'apple', 'blueberry', 'papaya', 'mango', 'pear']
new_order = sorted(fruits, key=lambda fruit_name: (len(fruit_name), fruit_name))
for fruit in new_order:
    print(fruit)

In [30]:
fruits = ['peach', 'kiwi', 'apple', 'blueberry', 'papaya', 'mango', 'pear']
new_order = sorted(fruits, key=lambda fruit_name: (len(fruit_name), fruit_name), reverse=True)
for fruit in new_order:
    print(fruit)

#saw the problem here, it reverses everything
#One solution is to add a negative sign in front of len(fruit_name), which will convert all positive numbers to negative, and all negative numbers to positive. As a result, the longest elements would be first and the shortest elements would be last.

blueberry
papaya
peach
mango
apple
pear
kiwi


In [29]:
fruits = ['peach', 'kiwi', 'apple', 'blueberry', 'papaya', 'mango', 'pear']
new_order = sorted(fruits, key=lambda fruit_name: (-len(fruit_name), fruit_name))
for fruit in new_order:
    print(fruit)

blueberry
papaya
apple
mango
peach
kiwi
pear


In [None]:
weather = {'Reykjavik': {'temp':60, 'condition': 'rainy'},
           'Buenos Aires': {'temp': 55, 'condition': 'cloudy'},
           'Cairo': {'temp': 96, 'condition': 'sunny'},
           'Berlin': {'temp': 89, 'condition': 'sunny'},
           'Caloocan': {'temp': 78, 'condition': 'sunny'}}

sorted_weather = sorted(weather, key=lambda w: (w, weather[w]['temp']))

#first city name (alphabetically), then temperature (lowest to highest)

it’s generally best to use lambda expressions until the process is too complicated
 the property we want to sort by is the number of cities that begin with the letter ‘S’. The function defining this property is harder to express, requiring a filter and count accumulation pattern. So we are better off defining a separate, named function. Here, we’ve chosen to make a lambda expression that looks up the value associated with the particular state and pass that value to the named function s_cities_count. We could have passed just the key, but then the function would have to look up the value, and it would be a little confusing, from the code, to figure out what dictionary the key is supposed to be looked up in. Here, we’ve done the lookup right in the lambda expression, which makes it a little bit clearer that we’re just sorting the keys of the states dictionary based on a property of their values. It also makes it easier to reuse the counting function on other city lists, even if they aren’t embedded in that particular states dictionary.

In [None]:
def s_cities_count(city_list):
    ct = 0
    for city in city_list:
        if city[0] == "S":
            ct += 1
    return ct

states = {"Minnesota": ["St. Paul", "Minneapolis", "Saint Cloud", "Stillwater"],
          "Michigan": ["Ann Arbor", "Traverse City", "Lansing", "Kalamazoo"],
          "Washington": ["Seattle", "Tacoma", "Olympia", "Vancouver"]}

print(sorted(states, key=lambda state: s_cities_count(states[state])))


# notes on project
1. replace (a string method) doesnt returns none, its not like append
we have to
s=s.replace(old,new)
holyfuck project


In [None]:
punctuation_chars = ["'", '"', ",", ".", "!", ":", ";", '#', '@']
#lists of words to use
positive_words = []
with open("positive_words.txt") as pos_f:
    for lin in pos_f:
        if lin[0] != ';' and lin[0] != '\n':
            positive_words.append(lin.strip())

negative_words = []
with open("negative_words.txt") as pos_f:
    for lin in pos_f:
        if lin[0] != ';' and lin[0] != '\n':
            negative_words.append(lin.strip())
            
            
def get_pos(st):
    c=0
    for word in strip_punctuation(st).lower().split():
        if word in positive_words:
            c+=1
    return c


def get_neg(st):
    c=0
    for word in strip_punctuation(st).lower().split():
        if word in negative_words:
            c+=1
    return c


def strip_punctuation(s):
    for char in s:
        if char in punctuation_chars:
            s=s.replace(char,'')
    return s 

outfile=open('resulting_data.csv','w')
outfile.write('Number of Retweets, Number of Replies, Positive Score, Negative Score, Net Score')
print(2)
outfile.write('\n')                    #making of headers
handle=open('project_twitter_data.csv') #has 1. text 2. #retweets 3. #replies
lines = handle.readlines()
header = lines[0]
field_names = header.strip().split(',')
print(field_names)
for line in lines[1:]:
    listx=line.strip().split(',')
    print(listx[1:])
    outfile.write('{},{},{},{},{}'.format(listx[1], listx[2], get_pos(listx[0]), get_neg(listx[0]), get_pos(listx[0])-get_neg(listx[0])))
    outfile.write('\n')
    print('line written')