# Case Study 1 : Data Science in Twitter Data

**Required Readings:** 
* Chapter 1 and Chapter 9 of the book [Mining the Social Web](http://cdn.oreillystatic.com/oreilly/booksamplers/9781449367619_sampler.pdf) 
* The codes for [Chapter 1](http://bit.ly/1qCtMrr) and [Chapter 9](http://bit.ly/1u7eP33)
* [TED Talks](https://www.ted.com/talks) for examples of 10 minutes talks.


** NOTE **
* Please don't forget to save the notebook frequently when working in Jupyter Notebook, otherwise the changes you made can be lost.

*----------------------

# Problem: pick a data science problem that you plan to solve using Twitter Data
* The problem should be important and interesting, which has a potential impact in some area.
* The problem should be solvable using twitter data and data science solutions.

Please briefly describe in the following cell: what problem are you trying to solve? why this problem is important and interesting?

In [None]:
How are emojis used in the most popular tweets?

For this problem, we will be using the number of retweets on a tweet to quantify its popularity.
The following metrics/questions will be relevant to this problem:
    How does the number of emojis used influence the popularity of the tweets?
    Which combinations of emojis used results in the most retweets?
    Which emojis used have a greater influence on a tweet's popularity?
    How does the time of day influence the popularity of certain emojis in tweets?
    How does the ratio of text to emojis influence the popularity of the tweet?

The problem of emoji popularity is very relevant and important to the modern social media landscape. Many companies perform
social media campaigns targeting the large section of the population using twitter, and any company doing this would be
interested in the conclusions we produce. Many of these companies, in their attempts to target their teen and millenial audience,
can be weeded out as 'corporate' and 'out of place' if they do not adhere to the internet social norms of emoji and 'meme' usage.
Our conclusions would be helpful to any such company, as we would be able to answer the question of 'how many emojis are 
too many?' and 'which emojis can I use together?






















## Data Collection: Download Twitter Data using API

* In order to solve the above problem, you need to collect some twitter data. You could select a topic that is relevant to your problem, and use Twitter API to download the relevant tweets. It is recommended that the number of tweets should be larger than 200, but smaller than 1 million.
* Store the tweets you downloaded into a local file (txt file or json file) 

In [10]:
import twitter, json
#---------------------------------------------
# Define a Function to Login Twitter API
def get_oauth_login():
    # Go to http://twitter.com/apps/new to create an app and get values
    # for these credentials that you'll need to provide in place of these
    # empty string values that are defined as placeholders.
    # See https://dev.twitter.com/docs/auth/oauth for more information 
    # on Twitter's OAuth implementation.
    
    CONSUMER_KEY = '6wqmAA369v9LKAM1iEEmworfI'
    CONSUMER_SECRET ='ZpGKzYbc9BGKyarLtrcg8DVpiYjbsAww7v2fgr0eres2r1vgUN'
    OAUTH_TOKEN = '842787201540915202-9HyTDqDstoubiShHeNxA6tk5CPY0vvG'
    OAUTH_TOKEN_SECRET = 'b2sKJWNi4nq1gfJo5Twqwb3H8HPwh14yzJ3lpfCUWXKYb'
    
    auth = twitter.oauth.OAuth(OAUTH_TOKEN, OAUTH_TOKEN_SECRET,
                               CONSUMER_KEY, CONSUMER_SECRET)
    
    #twitter_api = twitter.Twitter(auth=auth)
    return auth

#print oauth_login()

#----------------------------------------------
# Your code starts here
#   Please add comments or text cells in between to explain the general idea of each block of the code.
#   Please feel free to add more cells below this cell if necessary

# Only collect data on the northeast
# https://dev.twitter.com/docs/api/1.1/get/trends/place
# http://developer.yahoo.com/geo/geoplanet/

auth = get_oauth_login()
twitter_stream = twitter.TwitterStream(auth=get_oauth_login())
#iterator = twitter_stream.statuses.sample()
twitter_api = twitter.Twitter(auth=auth)
search_results = twitter_api.search.tweets(q= 'meme', lang='en', count=2000)
statuses = search_results['statuses']


for _ in range(5):
    print "Length of statuses", len(statuses)
    try:
        next_results = search_results['search_metadata']['next_results']
    except KeyError, e: # No more results when next_results doesn't exist
        break
        
    # Create a dictionary from next_results, which has the following form:
    # ?max_id=313519052523986943&q=NCAA&include_entities=1
    kwargs = dict([ kv.split('=') for kv in next_results[1:].split("&") ])
    
    search_results = twitter_api.search.tweets(**kwargs)
    statuses += search_results['statuses']

count = 2000
tweetfile = open("tweetswithrts","w")
print 'Starting...'
for tweet in statuses:
    count += -1
    tweetfile.write(json.dumps(tweet) + "\n")

    if count <=0:
        break

tweetfile.close()
print 'Closing...'




Length of statuses 100
Length of statuses 200
Length of statuses 300
Length of statuses 400
Length of statuses 500
Starting...
Closing...


### Report  statistics about the tweets you collected 

In [11]:
# The total number of tweets collected:  894
import json
tweetfile = open("tweetswithrts", "r")
tweets = []
num = 0
for line in tweetfile:
    #print line.strip()
    tweet = json.loads(line.strip())
    if 'text' in tweet:
        tweets += [tweet]
        num += 1
tweetfile.close()
print 'Num tweets: {}'.format(len(tweets))


Num tweets: 600


# Data Exploration: Exploring the Tweets and Tweet Entities

**(1) Word Count:** 
* Load the tweets you collected in the local file (txt or json)
* compute the frequencies of the words being used in these tweets. 
* Plot a table of the top 30 most-frequent words with their counts

In [12]:
# Your code starts here
#   Please add comments or text cells in between to explain the general idea of each block of the code.
#   Please feel free to add more cells below this cell if necessary
from collections import Counter

wordlist = []
for status in [tweet['text'] for tweet in tweets]:
   wordlist += status.split()

c = Counter(wordlist)
#print c.most_common()[:30]
 
from prettytable import PrettyTable

table = PrettyTable(field_names=["Word", "Frequency"])
[table.add_row(kv) for kv in c.most_common()[:30]]
table.align["Word"], table.align["Frequency"] = 'l','r'
print table













+------------------+-----------+
| Word             | Frequency |
+------------------+-----------+
| RT               |       336 |
| meme             |       336 |
| the              |       286 |
| a                |       195 |
| to               |       164 |
| of               |       150 |
| is               |       137 |
| it               |        97 |
| when             |        84 |
| I                |        81 |
| meme,            |        77 |
| living           |        74 |
| doing            |        72 |
| incarnation      |        69 |
| sweetie"         |        69 |
| comes            |        69 |
| basically        |        69 |
| Jungkook         |        69 |
| great,           |        69 |
| "you're          |        69 |
| Yoongi           |        69 |
| https:/…         |        69 |
| @sugakookielove: |        68 |
| that             |        58 |
| and              |        58 |
| about            |        56 |
| Meme             |        53 |
| you     

** (2) Find the most popular tweets in your collection of tweets**

Please plot a table of the top 10 most-retweeted tweets in your collection, i.e., the tweets with the largest number of retweet counts.


In [4]:
# Your code starts here
#   Please add comments or text cells in between to explain the general idea of each block of the code.
#   Please feel free to add more cells below this cell if necessary
retweets = [
            # Store out a tuple of these three values ...
            (status['retweet_count'], 
             status['retweeted_status']['user']['screen_name'],
             status['text']) 
            
            # ... for each status ...
            for status in tweets 
            
            # ... so long as the status meets this condition.
                if status.has_key('retweeted_status')
           ]

# Slice off the first 5 from the sorted results and display each item in the tuple

pt = PrettyTable(field_names=['Count', 'Screen Name', 'Text'])
[ pt.add_row(row) for row in sorted(retweets, reverse=True)[:5] ]
pt.max_width['Text'] = 50
pt.align= 'l'
print pt












+-------+---------------+----------------------------------------------------+
| Count | Screen Name   | Text                                               |
+-------+---------------+----------------------------------------------------+
| 46307 | NathanZed     | RT @NathanZed: there's never been a more           |
|       |               | appropriate time to use this meme                  |
|       |               | https://t.co/hfW5aJ70XK                            |
| 46307 | NathanZed     | RT @NathanZed: there's never been a more           |
|       |               | appropriate time to use this meme                  |
|       |               | https://t.co/hfW5aJ70XK                            |
| 17214 | ImGoinScottie | RT @ImGoinScottie: 2017 Meme Update😂😂😂          |
|       |               | https://t.co/yFRK5HCqPz                            |
| 17214 | ImGoinScottie | RT @ImGoinScottie: 2017 Meme Update😂😂😂          |
|       |               | https://t.co/yFRK5HCqPz         

**(3) Find the most popular Tweet Entities in your collection of tweets**

Please plot the top 10 most-frequent hashtags and top 10 most-mentioned users in your collection of tweets.

In [13]:
# Your code starts here
#   Please add comments or text cells in between to explain the general idea of each block of the code.
#   Please feel free to add more cells below this cell if necessary

# Get a list of all hashtags used in the tweets
hashtags = [ hashtag['text'] 
                for status in tweets 
                    for hashtag in status['entities']['hashtags']]

mentioned_users = [ user_mention['screen_name'] 
                     for status in tweets
                         for user_mention in status['entities']['user_mentions'] ]

hashtable = PrettyTable(field_names=['Hashtag', 'Count'])
c = Counter(hashtags)
[ hashtable.add_row(kv) for kv in c.most_common()[:10]]
hashtable.align['Hashtag'], hashtable.align['Count'] = 'l', 'r'
print hashtable

usertable = PrettyTable(field_names=['Screen Name', 'Count'])
c = Counter(mentioned_users)
[ usertable.add_row(kv) for kv in c.most_common()[:10]]
usertable.align['Mentioned User'], usertable.align['Count'] = 'l', 'r'
print usertable






+--------------+-------+
| Hashtag      | Count |
+--------------+-------+
| meme         |    15 |
| Jennie       |     6 |
| Lisa         |     6 |
| socialmedia  |     5 |
| Based        |     3 |
| bokep        |     3 |
| LondonBridge |     3 |
| Meme         |     3 |
| ngentot      |     3 |
| xxx          |     3 |
+--------------+-------+
+----------------+-------+
|  Screen Name   | Count |
+----------------+-------+
| sugakookielove |    68 |
|   maggieNYT    |    26 |
|   GAVlNREACT   |    23 |
|   jinkistar    |    18 |
|  vicegandako   |     6 |
|  nineteasbaby  |     6 |
|    mashable    |     5 |
| TheGenExtreme  |     5 |
|   tylerrsss    |     5 |
| KristophGavin3 |     5 |
+----------------+-------+


Plot a histogram of the number of user mentions in the list using the following bins.

In [14]:
bins=[0, 10, 20, 30, 40, 50, 100]

# Your code starts here
#   Please add comments or text cells in between to explain the general idea of each block of the code.
#   Please feel free to add more cells below this cell if necessary

import matplotlib
matplotlib.use('TKAgg')
import matplotlib.pyplot as plt
import numpy as np
import pylab

counts = Counter(mentioned_users)
listOfCounts = []

for user, count in counts.most_common():
    listOfCounts.append(count)

plt.hist(listOfCounts, bins=bins)
plt.title("User Mentions")
plt.xlabel('Bins (number of times mentioned)')
plt.ylabel('Number of tweets in bin')
plt.show()




 ** (4) Getting "All" friends and "All" followers of a popular user in the tweets**

* choose a popular twitter user who has many followers in your collection of tweets.
* Get the list of all friends and all followers of the twitter user.
* Plot 20 out of the followers, plot their ID numbers and screen names in a table.
* Plot 20 out of the friends (if the user has more than 20 friends), plot their ID numbers and screen names in a table.

In [3]:
# Your code starts here
#   Please add comments or text cells in between to explain the general idea of each block of the code.
#   Please feel free to add more cells below this cell if necessary
import twitter, json, re
from functools import partial
from sys import maxint
import sys
import time
from urllib2 import URLError
from httplib import BadStatusLine
from prettytable import PrettyTable

friends_limit=20
followers_limit=20
user = 'neiltyson'

#---------------------------------------------
# Define a Function to Login Twitter API
def get_oauth_login():
    # Go to http://twitter.com/apps/new to create an app and get values
    # for these credentials that you'll need to provide in place of these
    # empty string values that are defined as placeholders.
    # See https://dev.twitter.com/docs/auth/oauth for more information 
    # on Twitter's OAuth implementation.
    
    CONSUMER_KEY = '6wqmAA369v9LKAM1iEEmworfI'
    CONSUMER_SECRET ='ZpGKzYbc9BGKyarLtrcg8DVpiYjbsAww7v2fgr0eres2r1vgUN'
    OAUTH_TOKEN = '842787201540915202-9HyTDqDstoubiShHeNxA6tk5CPY0vvG'
    OAUTH_TOKEN_SECRET = 'b2sKJWNi4nq1gfJo5Twqwb3H8HPwh14yzJ3lpfCUWXKYb'
    
    auth = twitter.oauth.OAuth(OAUTH_TOKEN, OAUTH_TOKEN_SECRET,
                               CONSUMER_KEY, CONSUMER_SECRET)
    
    #twitter_api = twitter.Twitter(auth=auth)
    return auth

#Make a twitter request. From Mining the Social Web By Matthew A. Russel
def make_twitter_request(twitter_api_func, max_errors=10, *args, **kw):
    def handle_twitter_http_error(e, wait_period=2, sleep_when_rate_limited=True):
        if wait_period > 3600: # Seconds
            print >> sys.stderr, 'Too many retries. Quitting.'
            raise e	
        if e.e.code == 401:
            print >> sys.stderr, 'Encountered 401 Error (Not Authorized)'
            return None
        elif e.e.code == 404:
            print >> sys.stderr, 'Encountered 404 Error (Not Found)'
            return None
        elif e.e.code == 429:
            print >> sys.stderr, 'Encountered 429 Error (Rate Limit Exceeded)'
            if sleep_when_rate_limited:
                print >> sys.stderr, "Retrying in 15 minutes...ZzZ..."
                sys.stderr.flush()
                time.sleep(60*15 + 5)
                print >> sys.stderr, '...ZzZ...Awake now and trying again.'
                return 2
            else:
                raise e # Caller must handle the rate limiting issue
        elif e.e.code in (500, 502, 503, 504):
            print >> sys.stderr, 'Encountered %i Error. Retrying in %i seconds' % \
                (e.e.code, wait_period)
            time.sleep(wait_period)
            wait_period *= 1.5
            return wait_period
        else:
            raise e
    # End of nested helper function
    wait_period = 2
    error_count = 0
    while True:
        try:
            return twitter_api_func(*args, **kw)
        except twitter.api.TwitterHTTPError, e:
            error_count = 0
            wait_period = handle_twitter_http_error(e, wait_period)
            if wait_period is None:
                return
        except URLError, e:
            error_count += 1
            print >> sys.stderr, "URLError encountered. Continuing."
            if error_count > max_errors:
                print >> sys.stderr, "Too many consecutive errors...bailing out."
                raise
        except BadStatusLine, e:
            error_count += 1
            print >> sys.stderr, "BadStatusLine encountered. Continuing."
            if error_count > max_errors:
                print >> sys.stderr, "Too many consecutive errors...bailing out."
                raise

#Get an array of friends and followers in the form [[ListOfFriends][ListOfFollowers]]
def get_friends_followers_ids(twitter_api, screen_name=None, user_id=None, friends_limit=maxint, followers_limit=maxint):
    assert (screen_name != None) != (user_id != None), \
    "Must have screen_name or user_id, but not both"
    get_friends_ids = partial(make_twitter_request, twitter_api.friends.ids, count=5000)
    get_followers_ids = partial(make_twitter_request, twitter_api.followers.ids, count=5000)
    friends_ids, followers_ids = [], []
    # Must have either screen_name or user_id (logical xor)	
    for twitter_api_func, limit, ids, label in [[get_friends_ids, friends_limit, friends_ids, "friends"], [get_followers_ids, followers_limit, followers_ids, "followers"]]:
        if limit == 0: continue
        cursor = -1
        while cursor != 0:
            # Use make_twitter_request via the partially bound callable...
            if screen_name:
                response = twitter_api_func(screen_name=screen_name, cursor=cursor)
            #else: # user_id
                #response = twitter_api_func(user_id=user_id, cursor=cursor)
            if response is not None:
                ids += response['ids']
                cursor = response['next_cursor']
            print >> sys.stderr, 'Fetched {0} total {1} ids for {2}'.format(len(ids), label, (user_id or screen_name))
            # XXX: You may want to store data during each iteration to provide an
            # an additional layer of protection from exceptional circumstances
            if len(ids) >= limit or response is None:
                break
    # Do something useful with the IDs, like store them to disk...
    return friends_ids[:friends_limit], followers_ids[:followers_limit]

#Gets a list of user profile from a list of IDs. From Mining the Social Web By Matthew A. Russel
def get_user_profile(twitter_api, user_ids=None, screen_names=None):
    # Must have either screen_name or user_id (logical xor)
    assert (screen_names != None) != (user_ids != None), \
    "Must have screen_names or user_ids, but not both"
    items_to_info = {}
    items = screen_names or user_ids
    while len(items) > 0:
        # Process 100 items at a time per the API specifications for /users/lookup.
        items_str = ','.join([str(item) for item in items[:100]])
        items = items[100:]
        if screen_names:
            response = make_twitter_request(twitter_api.users.lookup, screen_name=items_str)
        else: # user_ids
            response = make_twitter_request(twitter_api.users.lookup, user_id=items_str)
        for user_info in response:
            if screen_names:
                items_to_info[user_info['screen_name']] = user_info
            else: # user_ids
                items_to_info[user_info['id']] = user_info
    return items_to_info

#Get a list of usernames from a list of profiles
def get_user_from_profiles (id_list, profile_list)	:
    user_list = []
    for i in range(len(profile_list)) :
        user_list.append(profile_list[id_list[i]]['screen_name'])
    return user_list

# Sample usage
auth = get_oauth_login()
twitter_api = twitter.Twitter(auth=auth)
friends_ids, followers_ids = get_friends_followers_ids(twitter_api, screen_name=user, user_id=None, friends_limit=20, followers_limit=20)

#Get the usernames from the IDs
friends_profiles = get_user_profile(twitter_api, friends_ids)
friends_username = get_user_from_profiles(friends_ids, friends_profiles)
#print friends_ids
#print friends_username

followers_profiles = get_user_profile(twitter_api, followers_ids)
followers_username = get_user_from_profiles(followers_ids, followers_profiles)
#print followers_ids
#print followers_username

print 'User is: ' + user
#Plot data in a prettytable
friends = [friends_ids, friends_username]
print 'Friends'
pt = PrettyTable()
pt.add_column('Username', friends_username)
pt.add_column('ID', friends_ids)
pt.max_width['Text'] = 50
pt.align= 'l'
print pt

print 'Followers'
pt = PrettyTable()
pt.add_column('Username', followers_username)
pt.add_column('ID', followers_ids)
pt.max_width['Text'] = 50
pt.align= 'l'
print pt

Fetched 51 total friends ids for neiltyson
Fetched 5000 total followers ids for neiltyson


User is: neiltyson
Friends
+-----------------+------------+
| Username        | ID         |
+-----------------+------------+
| DefenseIntel    | 117439544  |
| DeptofDefense   | 66369181   |
| USNavy          | 54885400   |
| DARPA           | 54645160   |
| oldpicsarchive  | 2441831348 |
| republicofmath  | 84653539   |
| Snowden         | 2916305152 |
| TheTweetOfGod   | 204832963  |
| levarburton     | 18396070   |
| LisaLampanelli  | 19542638   |
| BrannonBraga    | 851253492  |
| GirlsAreGeeks   | 127386221  |
| bug_gwen        | 19563103   |
| PaulProvenza    | 124050122  |
| billmaher       | 19697415   |
| JimGaffigan     | 6539592    |
| SarahKSilverman | 30364057   |
| BorowitzReport  | 17293897   |
| WhoopiGoldberg  | 284602545  |
| Burghound       | 17425538   |
+-----------------+------------+
Followers
+-----------------+--------------------+
| Username        | ID                 |
+-----------------+--------------------+
| Mcfrickinsophia | 844778915814031360 |
| pablo

# The Solution: implement a data science solution to the problem you are trying to solve.

Briefly describe the idea of your solution to the problem in the following cell:

Write codes to implement the solution in python:

In [None]:
# Your code starts here
#   Please add comments or text cells in between to explain the general idea of each block of the code.
#   Please feel free to add more cells below this cell if necessary

emojidatafile = open("emojidata", "r")
emojiCounts = []
retweets = []

for line in emojidatafile:
    data = line.rstrip().split(" ")
    if len(data) >= 2:  
        # general data
        emojiCounts.append(int(data[0]))
        retweets.append(int(data[1]))
        
        # averages
        

plt.scatter(emojiCounts, retweets)
plt.axis([0, 10, 0, 50000])
plt.title("Effects of Emojis on Retweets")
plt.ylabel("Retweets")
plt.xlabel("Emoji Count")
plt.show()
        













# Results: summarize and visualize the results discovered from the analysis

Please use figures, tables, or videos to communicate the results with the audience.


In [7]:
# Your code starts here
#   Please add comments or text cells in between to explain the general idea of each block of the code.
#   Please feel free to add more cells below this cell if necessary



*-----------------
# Done

All set! 

** What do you need to submit?**

* **Notebook File**: Save this Jupyter notebook, and find the notebook file in your folder (for example, "filename.ipynb"). This is the file you need to submit. Please make sure all the plotted tables and figures are in the notebook. If you used "jupyter notebook --pylab=inline" to open the notebook, all the figures and tables should have shown up in the notebook.

* **PPT Slides**: please prepare PPT slides (for 10 minutes' talk) to present about the case study . Each team present their case studies in class for 10 minutes.

Please compress all the files in a zipped file.


** How to submit: **

        Please submit through Canvas, in the Assignment "Case Study 1".
        
** Note: Each team only needs to submit one submission in Canvas **


# Peer-Review Grading Template:

** Total Points: (100 points) ** Please don't worry about the absolute scores, we will rescale the final grading according to the performance of all teams in the class.

Please add an "**X**" mark in front of your rating: 

For example:

*2: bad*
          
**X** *3: good*
    
*4: perfect*


    ---------------------------------
    The Problem: 
    ---------------------------------
    
    1. (5 points) how well did the team describe the problem they are trying to solve using twitter data? 
       0: not clear
       1: I can barely understand the problem
       2: okay, can be improved
       3: good, but can be improved
       4: very good
       5: crystal clear
    
    2. (10 points) do you think the problem is important or has a potential impact?
        0: not important at all
        2: not sure if it is important
        4: seems important, but not clear
        6: interesting problem
        8: an important problem, which I want to know the answer myself
       10: very important, I would be happy invest money on a project like this.
    
    ----------------------------------
    Data Collection:
    ----------------------------------
    
    3. (10 points) Do you think the data collected are relevant and sufficient for solving the above problem? 
       0: not clear
       2: I can barely understand what data they are trying to collect
       4: I can barely understand why the data is relevant to the problem
       6: the data are relevant to the problem, but better data can be collected
       8: the data collected are relevant and at a proper scale (> 300 tweets)
      10: the data are properly collected and they are sufficient

    -----------------------------------
    Data Exploration:
    -----------------------------------
    4. How well did the team solve the following task:
    (1) Word Count (5 points):
       0: missing answer
       1: okay, but with major problems
       3: good, but with minor problems
       5: perfect
    
    (2) Find the most popular tweets in your collection of tweets: (5 points)
       0: missing answer
       1: okay, but with major problems
       3: good, but with minor problems
       5: perfect
    
    (3) Find popular twitter entities  (5 points)
       0: missing answer
       1: okay, but with major problems
       3: good, but with minor problems
       5: perfect

    (4) Find user's followers and friends (5 points)
       0: missing answer
       1: okay, but with major problems
       3: good, but with minor problems
       5: perfect

    -----------------------------------
    The Solution
    -----------------------------------
    5.  how well did the team describe the solution they used to solve the problem? 
       0: not clear
       2: I can barely understand
       4: okay, can be improved
       6: good, but can be improved
       8: very good
       10: crystal clear
       
    6. how well is the solution in solving the problem? 
       0: not relevant
       1: barely relevant to the problem
       2: okay solution, but there is an easier solution.
       3: good, but can be improved
       4: very good, but solution is simple/old
       5: innovative and technically sound
       
    7. how well did the team implement the solution in python? 
       0: the code is not relevant to the solution proposed
       2: the code is barely understandable, but not relevant
       4: okay, the code is clear but incorrect
       6: good, the code is correct, but with major errors
       8: very good, the code is correct, but with minor errors
      10: perfect 
   
    -----------------------------------
    The Results
    -----------------------------------
     8.  How well did the team present the results they found in the data? 
       0: not clear
       2: I can barely understand
       4: okay, can be improved
       6: good, but can be improved
       8: very good
      10: crystal clear
       
     9.  How do you think the results they found in the data? 
       0: not clear
       1: likely to be wrong
       2: okay, maybe wrong
       3: good, but can be improved
       4: make sense, but not interesting
       5: make sense and very interesting
     
    -----------------------------------
    The Presentation
    -----------------------------------
    10. How all the different parts (data, problem, solution, result) fit together as a coherent story?  
       0: they are irrelevant
       1: I can barely understand how they are related to each other
       2: okay, the problem is good, but the solution doesn't match well, or the problem is not solvable.
       3: good, but the results don't make much sense in the context
       4: very good fit, but not exciting (the storyline can be improved/polished)
       5: a perfect story
      
    11. Did the presenter make good use of the 10 minutes for presentation?  
       0: the team didn't present
       1: bad, barely finished a small part of the talk
       2: okay, barely finished most parts of the talk.
       3: good, finished all parts of the talk, but some part is rushed
       4: very good, but the allocation of time on different parts can be improved.
       5: perfect timing and good use of time      

    12. How well do you think of the presentation (overall quality)?  
       0: the team didn't present
       1: bad
       2: okay
       3: good
       4: very good
       5: perfect


    -----------------------------------
    Overall: 
    -----------------------------------
    13. How many points out of the 100 do you give to this project in total?  Please don't worry about the absolute scores, we will rescale the final grading according to the performance of all teams in the class.
    Total score:
    
    14. What are the strengths of this project? Briefly, list up to 3 strengths.
       1: 
       2:
       3:
    
    15. What are the weaknesses of this project? Briefly, list up to 3 weaknesses.
       1:
       2:
       3:
    
    16. Detailed comments and suggestions. What suggestions do you have for this project to improve its quality further.
    
    
    

    ---------------------------------
    Your Vote: 
    ---------------------------------
    1. [Overall Quality] Between the two submissions that you are reviewing, which team would you vote for a better score?  
       -1: I vote the other team is better than this team
        0: the same
        1: I vote this team is better than the other team 
        
    2. [Presentation] Among all the teams in the presentation, which team do you think deserves the best presentation award for this case study?  
        1: Team 1
        2: Team 2
        3: Team 3
        4: Team 4
        5: Team 5
        6: Team 6
        7: Team 7
        8: Team 8
        9: Team 9
       10: Team 10

