# Twitter data

## Copyright and Licensing

You are free to use or adapt this notebook for any purpose you'd like. However, please respect the [Simplified BSD License](https://github.com/ptwobrussell/Mining-the-Social-Web-2nd-Edition/blob/master/LICENSE.txt) that governs its use.

# Twitter API Access

Twitter implements OAuth 1.0A as its standard authentication mechanism, and in order to use it to make requests to Twitter's API, you'll need to go to https://dev.twitter.com/apps and create a sample application.

Choose any name for your application, write a description and use `http://google.com` for the website.

Under **Key and Access Tokens**, there are four primary identifiers you'll need to note for an OAuth 1.0A workflow: 
* consumer key, 
* consumer secret, 
* access token, and 
* access token secret (Click on Create Access Token to create those).

Note that you will need an ordinary Twitter account in order to login, create an app, and get these credentials.

The first time you execute the notebook, add all credentials so that you can save them in the `pkl` file, then you can remove the secret keys from the notebook because they will just be loaded from the `pkl` file.

The `pkl` file contains sensitive information that can be used to take control of your twitter acccount, **do not share it**.

In [1]:
import pickle
import os

In [2]:
if not os.path.exists('secret_twitter_credentials.pkl'):
    
    with open('secret_twitter_credentials.pkl','wb') as f:
        pickle.dump(Twitter, f)
        
else:
    Twitter=pickle.load(open('secret_twitter_credentials.pkl','rb'))

Install the `twitter` package to interface with the Twitter API

In [3]:
#!pip install twitter

## Example 1. Authorizing an application to access Twitter account data

In [4]:
import twitter

auth = twitter.oauth.OAuth(Twitter['Access Token'],
                           Twitter['Access Token Secret'],
                           Twitter['Consumer Key'],
                           Twitter['Consumer Secret'])

twitter_api = twitter.Twitter(auth=auth)

# Nothing to see by displaying twitter_api except that it's now a
# defined variable

print(twitter_api)

<twitter.api.Twitter object at 0x108b612b0>


## Example 2. Retrieving trends

Twitter identifies locations using the Yahoo! Where On Earth ID.

The Yahoo! Where On Earth ID for the entire world is 1.
See https://dev.twitter.com/docs/api/1.1/get/trends/place and
http://developer.yahoo.com/geo/geoplanet/

look at the BOSS placefinder here: https://developer.yahoo.com/boss/placefinder/

In [5]:
WORLD_WOE_ID = 1
US_WOE_ID = 23424977

Look for the WOEID for [san-diego](http://woeid.rosselliot.co.nz/lookup/san%20diego%20%20ca)

You can change it to another location.

In [6]:
LOCAL_WOE_ID=2487889
#LOCAL_WOE_ID=725003

# Prefix ID with the underscore for query string parameterization.
# Without the underscore, the twitter package appends the ID value
# to the URL itself as a special case keyword argument.

world_trends = twitter_api.trends.place(_id=WORLD_WOE_ID)
us_trends = twitter_api.trends.place(_id=US_WOE_ID)
local_trends = twitter_api.trends.place(_id=LOCAL_WOE_ID)

In [7]:
world_trends[0]

{'trends': [{'name': '#InternationalWomensDay',
   'url': 'http://twitter.com/search?q=%23InternationalWomensDay',
   'promoted_content': None,
   'query': '%23InternationalWomensDay',
   'tweet_volume': 1903610},
  {'name': '#OurFirstLoveYoongi',
   'url': 'http://twitter.com/search?q=%23OurFirstLoveYoongi',
   'promoted_content': None,
   'query': '%23OurFirstLoveYoongi',
   'tweet_volume': 1298689},
  {'name': '#النصر_الاتفاق',
   'url': 'http://twitter.com/search?q=%23%D8%A7%D9%84%D9%86%D8%B5%D8%B1_%D8%A7%D9%84%D8%A7%D8%AA%D9%81%D8%A7%D9%82',
   'promoted_content': None,
   'query': '%23%D8%A7%D9%84%D9%86%D8%B5%D8%B1_%D8%A7%D9%84%D8%A7%D8%AA%D9%81%D8%A7%D9%82',
   'tweet_volume': 257614},
  {'name': '#HAPPYSUGADAY',
   'url': 'http://twitter.com/search?q=%23HAPPYSUGADAY',
   'promoted_content': None,
   'query': '%23HAPPYSUGADAY',
   'tweet_volume': 1805196},
  {'name': '#الهلال_الوحده',
   'url': 'http://twitter.com/search?q=%23%D8%A7%D9%84%D9%87%D9%84%D8%A7%D9%84_%D8%A7%D9%84%D9%

In [8]:
trends=local_trends
print(type(trends))
print(list(trends[0].keys()))
print(trends[0]['trends'])

<class 'twitter.api.TwitterListResponse'>
['trends', 'as_of', 'created_at', 'locations']
[{'name': '#InternationalWomensDay', 'url': 'http://twitter.com/search?q=%23InternationalWomensDay', 'promoted_content': None, 'query': '%23InternationalWomensDay', 'tweet_volume': 1903610}, {'name': '#IWD2019', 'url': 'http://twitter.com/search?q=%23IWD2019', 'promoted_content': None, 'query': '%23IWD2019', 'tweet_volume': 870062}, {'name': '#WomensDay', 'url': 'http://twitter.com/search?q=%23WomensDay', 'promoted_content': None, 'query': '%23WomensDay', 'tweet_volume': 432076}, {'name': '#BalanceforBetter', 'url': 'http://twitter.com/search?q=%23BalanceforBetter', 'promoted_content': None, 'query': '%23BalanceforBetter', 'tweet_volume': 280068}, {'name': '#OurFirstLoveYoongi', 'url': 'http://twitter.com/search?q=%23OurFirstLoveYoongi', 'promoted_content': None, 'query': '%23OurFirstLoveYoongi', 'tweet_volume': 1298689}, {'name': 'Bill Shine', 'url': 'http://twitter.com/search?q=%22Bill+Shine%22',

## Example 3. Displaying API responses as pretty-printed JSON

In [9]:
import json

print((json.dumps(local_trends[:2], indent=1)))

[
 {
  "trends": [
   {
    "name": "#InternationalWomensDay",
    "url": "http://twitter.com/search?q=%23InternationalWomensDay",
    "promoted_content": null,
    "query": "%23InternationalWomensDay",
    "tweet_volume": 1903610
   },
   {
    "name": "#IWD2019",
    "url": "http://twitter.com/search?q=%23IWD2019",
    "promoted_content": null,
    "query": "%23IWD2019",
    "tweet_volume": 870062
   },
   {
    "name": "#WomensDay",
    "url": "http://twitter.com/search?q=%23WomensDay",
    "promoted_content": null,
    "query": "%23WomensDay",
    "tweet_volume": 432076
   },
   {
    "name": "#BalanceforBetter",
    "url": "http://twitter.com/search?q=%23BalanceforBetter",
    "promoted_content": null,
    "query": "%23BalanceforBetter",
    "tweet_volume": 280068
   },
   {
    "name": "#OurFirstLoveYoongi",
    "url": "http://twitter.com/search?q=%23OurFirstLoveYoongi",
    "promoted_content": null,
    "query": "%23OurFirstLoveYoongi",
    "tweet_volume": 1298689
   },
   {
   

## Example 4. Computing the intersection of two sets of trends

In [10]:
trends_set = {}
trends_set['world'] = set([trend['name'] 
                        for trend in world_trends[0]['trends']])

trends_set['us'] = set([trend['name'] 
                     for trend in us_trends[0]['trends']]) 

trends_set['san diego'] = set([trend['name'] 
                     for trend in local_trends[0]['trends']]) 

trends_set

{'world': {'#1000Kimyadeğişendünya',
  '#8M2019',
  '#8Mar',
  '#Airwolf',
  '#ArkaSokaklarFinal',
  '#DevletinBekası',
  '#DiaInternacionalDeLaMujer',
  '#DíaDeLaMujer',
  '#FeministGeceYürüyüşü',
  '#FreeCodeFridayContest',
  '#GlowingMinPDay',
  '#GreatestTaeyeonDay',
  '#GüzelBirŞeylerYaz',
  '#HAPPYSUGADAY',
  '#HuelgaFeminista2019',
  '#IWD2109',
  '#IWDay2019',
  '#InternationalWomensDay',
  '#MansurdanBüyükVurgun',
  '#MargiesMark',
  '#MinstradamusDay',
  '#NationalWomensDay',
  '#NuestroGenioMusicalYoongi',
  '#OurFirstLoveYoongi',
  '#PlaylistBBB',
  '#SamsunluKızlarSüperdir',
  '#SesVerTürkiye',
  '#SheInspiresMe',
  '#WhatAReliefYoongiWasBorn',
  '#YoongiLetsDreamTogether',
  '#Ziyahocamfelsefeciata',
  '#sweeps',
  '#الاتحاد_الفيحاء',
  '#النصر_الاتفاق',
  '#الهلال_الوحده',
  '#슈가생일ᄎᄏ',
  '#윤기_멋대로_살아_전부_니꺼야',
  '3BinÖzel EgitimciAta',
  'Bill Shine',
  'Chelsea Manning',
  'FreybetteYasal KrediKartı',
  'Hilbet 10LiraVeriyor',
  'Jan-Michael Vincent',
  'KurandaKadın Dövm

In [11]:
for loc in ['world','us','san diego']:
    print(('-'*10,loc))
    print((','.join(trends_set[loc])))

('----------', 'world')
#النصر_الاتفاق,#الهلال_الوحده,#DíaDeLaMujer,#DiaInternacionalDeLaMujer,#HAPPYSUGADAY,#윤기_멋대로_살아_전부_니꺼야,#MansurdanBüyükVurgun,#MinstradamusDay,#FeministGeceYürüyüşü,#슈가생일ᄎᄏ,KurandaKadın DövmekYoktur,#FreeCodeFridayContest,#PlaylistBBB,#WhatAReliefYoongiWasBorn,Chelsea Manning,#GreatestTaeyeonDay,#1000Kimyadeğişendünya,#ArkaSokaklarFinal,Min Yoongi,Bill Shine,#HuelgaFeminista2019,#IWD2109,#SheInspiresMe,İlkokullara 2300Besyocu,#Ziyahocamfelsefeciata,#sweeps,#الاتحاد_الفيحاء,#SesVerTürkiye,#DevletinBekası,3BinÖzel EgitimciAta,#NuestroGenioMusicalYoongi,震度3,#GlowingMinPDay,Shane Larkin,#SamsunluKızlarSüperdir,#Airwolf,#InternationalWomensDay,#8Mar,PrizmabetATMden ParaCekimi,Jan-Michael Vincent,FreybetteYasal KrediKartı,#8M2019,#IWDay2019,Hilbet 10LiraVeriyor,Maradona,#MargiesMark,#YoongiLetsDreamTogether,#NationalWomensDay,#OurFirstLoveYoongi,#GüzelBirŞeylerYaz
('----------', 'us')
Corbett,#DiaInternacionalDeLaMujer,#윤기_멋대로_살아_전부_니꺼야,#WomensDay,#MinstradamusDay,Vern

In [12]:
print(( '='*10,'intersection of world and us'))
print((trends_set['world'].intersection(trends_set['us'])))

print(('='*10,'intersection of us and san-diego'))
print((trends_set['san diego'].intersection(trends_set['us'])))

{'#InternationalWomensDay', 'Min Yoongi', '#DiaInternacionalDeLaMujer', 'Bill Shine', '#윤기_멋대로_살아_전부_니꺼야', '#GlowingMinPDay', '#MinstradamusDay', '#NationalWomensDay', '#IWD2109', '#FreeCodeFridayContest', '#SheInspiresMe', '#sweeps', '#MargiesMark', '#WhatAReliefYoongiWasBorn', 'Chelsea Manning', '#OurFirstLoveYoongi', 'Jan-Michael Vincent'}
{'Corbett', '#DiaInternacionalDeLaMujer', '#윤기_멋대로_살아_전부_니꺼야', '#MinstradamusDay', '#WomensDay', 'Vernon', '#FreeCodeFridayContest', '#ForThePeople', '#HappyWomensDay2019', '#ArchivesHerstory', '#WhatAReliefYoongiWasBorn', 'Chelsea Manning', 'Airwolf', 'Carlos Hyde', '#WithoutPets', '#QuotesFromASeashell', 'Min Yoongi', '#FridayMotivation', 'Bill Shine', '#SomethingintheWater', '#girlpower', '#IWD2109', '#SheInspiresMe', 'Dan Jenkins', "Women's Day", '#sweeps', '#IWD2019', '#SXSW19', '#IWD19', '#FridayThoughts', '#WomenInSTEM', '#SITW', 'Happy International', '#InternationalWomensDay', 'Stephanie Flowers', 'Jan-Michael Vincent', '#BalanceforBetter

## Example 5. Collecting search results

Set the variable `q` to a trending topic, 
or anything else for that matter. The example query below
was a trending topic when this content was being developed
and is used throughout the remainder of this chapter

In [112]:
q = '#guitar' 

number = 100

# See https://dev.twitter.com/docs/api/1.1/get/search/tweets

search_results = twitter_api.search.tweets(q=q, count=number)

statuses = search_results['statuses']

In [113]:
#len(statuses)
print(statuses)

[{'created_at': 'Sat Mar 09 13:25:14 +0000 2019', 'id': 1104372572077080576, 'id_str': '1104372572077080576', 'text': 'A beautiful gift for #womensday? its #guitar #guitarpeople #thetool #music #sweden #stockholm… https://t.co/8OM1APaLuI', 'truncated': True, 'entities': {'hashtags': [{'text': 'womensday', 'indices': [21, 31]}, {'text': 'guitar', 'indices': [37, 44]}, {'text': 'guitarpeople', 'indices': [45, 58]}, {'text': 'thetool', 'indices': [59, 67]}, {'text': 'music', 'indices': [68, 74]}, {'text': 'sweden', 'indices': [75, 82]}, {'text': 'stockholm', 'indices': [83, 93]}], 'symbols': [], 'user_mentions': [], 'urls': [{'url': 'https://t.co/8OM1APaLuI', 'expanded_url': 'https://twitter.com/i/web/status/1104372572077080576', 'display_url': 'twitter.com/i/web/status/1…', 'indices': [95, 118]}]}, 'metadata': {'iso_language_code': 'en', 'result_type': 'recent'}, 'source': '<a href="http://twitter.com" rel="nofollow">Twitter Web Client</a>', 'in_reply_to_status_id': None, 'in_reply_to_st

In [114]:
print((json.dumps(statuses, indent = 2)))

[
  {
    "created_at": "Sat Mar 09 13:25:14 +0000 2019",
    "id": 1104372572077080576,
    "id_str": "1104372572077080576",
    "text": "A beautiful gift for #womensday? its #guitar #guitarpeople #thetool #music #sweden #stockholm\u2026 https://t.co/8OM1APaLuI",
    "truncated": true,
    "entities": {
      "hashtags": [
        {
          "text": "womensday",
          "indices": [
            21,
            31
          ]
        },
        {
          "text": "guitar",
          "indices": [
            37,
            44
          ]
        },
        {
          "text": "guitarpeople",
          "indices": [
            45,
            58
          ]
        },
        {
          "text": "thetool",
          "indices": [
            59,
            67
          ]
        },
        {
          "text": "music",
          "indices": [
            68,
            74
          ]
        },
        {
          "text": "sweden",
          "indices": [
            75,
            8

Twitter often returns duplicate results, we can filter them out checking for duplicate texts:

In [115]:
all_text = []
filtered_statuses = []
for s in statuses:
    if not s["text"] in all_text:
        filtered_statuses.append(s)
        all_text.append(s["text"])
statuses = filtered_statuses     

In [116]:
len(statuses)

71

In [117]:
[s['text'] for s in search_results['statuses']]

['A beautiful gift for #womensday? its #guitar #guitarpeople #thetool #music #sweden #stockholm… https://t.co/8OM1APaLuI',
 'マイギター\n\n改造中です笑\n\n#guitar #ギター #改造 #ゴダン https://t.co/KT2aPI73CP',
 'My new song "Waiting for Jack for/with Jack Thammarat is coming out soon. With #jensmayerguitar and #jackthammarat… https://t.co/ReDRH7Kwyf',
 "RT @Song7th: B'zの松本さんに倉木さん✨\n\nヒルパンの歴史はスゲーぜ！\n\n#guitar\n#ヒルズパン工場 https://t.co/SwZV4MvQ6B",
 "*TEASER*\n\nWe are excited share a small taste of what this band is about and what we have been working on:\n\n'NEW YOR… https://t.co/E4LorFMUvc",
 'https://t.co/fb8LtP1LDZ @linkinpark @linkinparkfr @mikeshinoda #chesterbenningtons #cover #rock #metal #rap #guitar… https://t.co/usQK7dzr9c',
 'RT @ShinsukeSada: DEANさん広島コンサート終了致しました‼️✨✨✨\nいやぁ最高に盛り上がりました😆‼️\n広島の皆さまありがとうございます‼️✨✨\n\n#deanfujioka \n#borntomakehistory \n#1stasiatour2019 \n#広島…',
 'RT @Shun29058325: 【ギター解説】\n超絶テクなし！音楽的な演奏を…MR.BIG - Just Take My Heart\nのギターソロを解説しました！\nhttps://t.co/R600LCHEWP\n\n🎸宜しければチャ

In [118]:
for i in all_text:
    print("\n")
    print(i)



A beautiful gift for #womensday? its #guitar #guitarpeople #thetool #music #sweden #stockholm… https://t.co/8OM1APaLuI


マイギター

改造中です笑

#guitar #ギター #改造 #ゴダン https://t.co/KT2aPI73CP


My new song "Waiting for Jack for/with Jack Thammarat is coming out soon. With #jensmayerguitar and #jackthammarat… https://t.co/ReDRH7Kwyf


RT @Song7th: B'zの松本さんに倉木さん✨

ヒルパンの歴史はスゲーぜ！

#guitar
#ヒルズパン工場 https://t.co/SwZV4MvQ6B


*TEASER*

We are excited share a small taste of what this band is about and what we have been working on:

'NEW YOR… https://t.co/E4LorFMUvc


https://t.co/fb8LtP1LDZ @linkinpark @linkinparkfr @mikeshinoda #chesterbenningtons #cover #rock #metal #rap #guitar… https://t.co/usQK7dzr9c


RT @ShinsukeSada: DEANさん広島コンサート終了致しました‼️✨✨✨
いやぁ最高に盛り上がりました😆‼️
広島の皆さまありがとうございます‼️✨✨

#deanfujioka 
#borntomakehistory 
#1stasiatour2019 
#広島…


RT @Shun29058325: 【ギター解説】
超絶テクなし！音楽的な演奏を…MR.BIG - Just Take My Heart
のギターソロを解説しました！
https://t.co/R600LCHEWP

🎸宜しければチャンネル登録をお願い致します🎵
https:/…


RT @ELDaisy61

In [119]:
# Show one sample search result by slicing the list...
print(json.dumps(statuses, indent=1))

[
 {
  "created_at": "Sat Mar 09 13:25:14 +0000 2019",
  "id": 1104372572077080576,
  "id_str": "1104372572077080576",
  "text": "A beautiful gift for #womensday? its #guitar #guitarpeople #thetool #music #sweden #stockholm\u2026 https://t.co/8OM1APaLuI",
  "truncated": true,
  "entities": {
   "hashtags": [
    {
     "text": "womensday",
     "indices": [
      21,
      31
     ]
    },
    {
     "text": "guitar",
     "indices": [
      37,
      44
     ]
    },
    {
     "text": "guitarpeople",
     "indices": [
      45,
      58
     ]
    },
    {
     "text": "thetool",
     "indices": [
      59,
      67
     ]
    },
    {
     "text": "music",
     "indices": [
      68,
      74
     ]
    },
    {
     "text": "sweden",
     "indices": [
      75,
      82
     ]
    },
    {
     "text": "stockholm",
     "indices": [
      83,
      93
     ]
    }
   ],
   "symbols": [],
   "user_mentions": [],
   "urls": [
    {
     "url": "https://t.co/8OM1APaLuI",
     "expande

In [120]:
# The result of the list comprehension is a list with only one element that
# can be accessed by its index and set to the variable t
t = statuses[5]
#[ status for status in statuses 
#          if status['id'] == 316948241264549888 ][0]

# Explore the variable t to get familiarized with the data structure...

#print(t['retweet_count'])
#print(t['retweeted'])
#print(t['geo'])
#print(type(t))
for i,j in t.items():
    print(i,j)

created_at Sat Mar 09 13:20:08 +0000 2019
id 1104371288825626624
id_str 1104371288825626624
text https://t.co/fb8LtP1LDZ @linkinpark @linkinparkfr @mikeshinoda #chesterbenningtons #cover #rock #metal #rap #guitar… https://t.co/usQK7dzr9c
truncated True
entities {'hashtags': [{'text': 'chesterbenningtons', 'indices': [63, 82]}, {'text': 'cover', 'indices': [83, 89]}, {'text': 'rock', 'indices': [90, 95]}, {'text': 'metal', 'indices': [96, 102]}, {'text': 'rap', 'indices': [103, 107]}, {'text': 'guitar', 'indices': [108, 115]}], 'symbols': [], 'user_mentions': [{'screen_name': 'linkinpark', 'name': 'LINKIN PARK', 'id': 19373710, 'id_str': '19373710', 'indices': [24, 35]}, {'screen_name': 'linkinparkfr', 'name': 'Linkin Park France', 'id': 43541137, 'id_str': '43541137', 'indices': [36, 49]}, {'screen_name': 'mikeshinoda', 'name': 'Mike Shinoda', 'id': 66528154, 'id_str': '66528154', 'indices': [50, 62]}], 'urls': [{'url': 'https://t.co/fb8LtP1LDZ', 'expanded_url': 'https://www.youtube.co

## Example 6. Extracting text, screen names, and hashtags from tweets

In [121]:
import string
import nltk
string.punctuation = string.punctuation + '•'
useless_words = nltk.corpus.stopwords.words("english", 'The') + list(string.punctuation)
useless_words

['i',
 'me',
 'my',
 'myself',
 'we',
 'our',
 'ours',
 'ourselves',
 'you',
 "you're",
 "you've",
 "you'll",
 "you'd",
 'your',
 'yours',
 'yourself',
 'yourselves',
 'he',
 'him',
 'his',
 'himself',
 'she',
 "she's",
 'her',
 'hers',
 'herself',
 'it',
 "it's",
 'its',
 'itself',
 'they',
 'them',
 'their',
 'theirs',
 'themselves',
 'what',
 'which',
 'who',
 'whom',
 'this',
 'that',
 "that'll",
 'these',
 'those',
 'am',
 'is',
 'are',
 'was',
 'were',
 'be',
 'been',
 'being',
 'have',
 'has',
 'had',
 'having',
 'do',
 'does',
 'did',
 'doing',
 'a',
 'an',
 'the',
 'and',
 'but',
 'if',
 'or',
 'because',
 'as',
 'until',
 'while',
 'of',
 'at',
 'by',
 'for',
 'with',
 'about',
 'against',
 'between',
 'into',
 'through',
 'during',
 'before',
 'after',
 'above',
 'below',
 'to',
 'from',
 'up',
 'down',
 'in',
 'out',
 'on',
 'off',
 'over',
 'under',
 'again',
 'further',
 'then',
 'once',
 'here',
 'there',
 'when',
 'where',
 'why',
 'how',
 'all',
 'any',
 'both',
 'each

In [122]:
status_texts = [ status['text'] 
                 for status in statuses ]

screen_names = [ user_mention['screen_name'] 
                 for status in statuses
                     for user_mention in status['entities']['user_mentions'] ]

hashtags = [ hashtag['text'] 
             for status in statuses
                 for hashtag in status['entities']['hashtags'] ]

# Compute a collection of all words from all tweets
words1 = [ w 
          for t in status_texts 
              for w in t.split()]

words = []

for w in words1:
    if w not in useless_words:
        words.append(w)

In [123]:
# Explore the first 5 items for each...

print(json.dumps(status_texts[0:5], indent=1))
print(json.dumps(screen_names[0:5], indent=1)) 
print(json.dumps(hashtags[0:5], indent=1))
print(json.dumps(words[0:5], indent=1))

[
 "A beautiful gift for #womensday? its #guitar #guitarpeople #thetool #music #sweden #stockholm\u2026 https://t.co/8OM1APaLuI",
 "\u30de\u30a4\u30ae\u30bf\u30fc\n\n\u6539\u9020\u4e2d\u3067\u3059\u7b11\n\n#guitar #\u30ae\u30bf\u30fc #\u6539\u9020 #\u30b4\u30c0\u30f3 https://t.co/KT2aPI73CP",
 "My new song \"Waiting for Jack for/with Jack Thammarat is coming out soon. With #jensmayerguitar and #jackthammarat\u2026 https://t.co/ReDRH7Kwyf",
 "RT @Song7th: B'z\u306e\u677e\u672c\u3055\u3093\u306b\u5009\u6728\u3055\u3093\u2728\n\n\u30d2\u30eb\u30d1\u30f3\u306e\u6b74\u53f2\u306f\u30b9\u30b2\u30fc\u305c\uff01\n\n#guitar\n#\u30d2\u30eb\u30ba\u30d1\u30f3\u5de5\u5834 https://t.co/SwZV4MvQ6B",
 "*TEASER*\n\nWe are excited share a small taste of what this band is about and what we have been working on:\n\n'NEW YOR\u2026 https://t.co/E4LorFMUvc"
]
[
 "Song7th",
 "linkinpark",
 "linkinparkfr",
 "mikeshinoda",
 "ShinsukeSada"
]
[
 "womensday",
 "guitar",
 "guitarpeople",
 "thetool",
 "music"
]
[
 "A

## Example 7. Creating a basic frequency distribution from the words in tweets

In [124]:
from collections import Counter

for item in [words, screen_names, hashtags]:
    c = Counter(item)
    print(c.most_common()[:20]) # top 10
    print()

[('#guitar', 28), ('RT', 20), ('#Win', 13), ('#RT', 13), ('&amp;', 12), ('#music', 9), ('Guitar', 9), ('#Enter', 9), ('#Contest', 9), ('#ReTweet', 8), ('Guitar!', 6), ('#ギター', 5), ('My', 5), ('#rock', 5), ('#guitar…', 5), ('#Giveaway', 5), ('#ReTweet…', 5), ('Electric', 5), ('#cover', 4), ('Giveaway!', 4)]

[('aldioustoki', 2), ('choonhq', 2), ('Song7th', 1), ('linkinpark', 1), ('linkinparkfr', 1), ('mikeshinoda', 1), ('ShinsukeSada', 1), ('Shun29058325', 1), ('ELDaisy61', 1), ('Aldious_USA', 1), ('Beralto89', 1), ('tobel0ved', 1), ('daveleask', 1), ('TheOldDancer', 1), ('junglee_pic', 1), ('YouTube', 1), ('ny2wn', 1), ('nickcave', 1), ('levihickory', 1), ('MansonsGuitars', 1)]

[('guitar', 33), ('Win', 13), ('RT', 13), ('ReTweet', 13), ('Giveaway', 11), ('Contest', 11), ('music', 10), ('Enter', 9), ('Guitar', 7), ('ギター', 5), ('rock', 5), ('cover', 4), ('Free', 4), ('Gear', 4), ('Amp', 4), ('metal', 3), ('WIN', 3), ('Blockchain', 3), ('love', 3), ('live', 3)]



## Example 8. Create a prettyprint function to display tuples in a nice tabular format

In [125]:
def prettyprint_counts(label, list_of_tuples):
    print("\n{:^24} | {:>10}".format(label, "Count"))
    print("*"*40)
    for k,v in list_of_tuples:
        print("{:24} | {:>10}".format(k,v))

In [126]:
for label, data in (('Word', words), 
                    ('Screen Name', screen_names), 
                    ('Hashtag', hashtags)):
    
    c = Counter(data)
    prettyprint_counts(label, c.most_common()[:10])


          Word           |      Count
****************************************
#guitar                  |         28
RT                       |         20
#Win                     |         13
#RT                      |         13
&amp;                    |         12
#music                   |          9
Guitar                   |          9
#Enter                   |          9
#Contest                 |          9
#ReTweet                 |          8

      Screen Name        |      Count
****************************************
aldioustoki              |          2
choonhq                  |          2
Song7th                  |          1
linkinpark               |          1
linkinparkfr             |          1
mikeshinoda              |          1
ShinsukeSada             |          1
Shun29058325             |          1
ELDaisy61                |          1
Aldious_USA              |          1

        Hashtag          |      Count
****************************************


## Example 9. Finding the most popular retweets

In [127]:
retweets = [
            # Store out a tuple of these three values ...
            (status['retweet_count'], 
             status['retweeted_status']['user']['screen_name'],
             status['text'].replace("\n","\\")) 
            
            # ... for each status ...
            for status in statuses 
            
            # ... so long as the status meets this condition.
                if 'retweeted_status' in status
           ]

We can build another `prettyprint` function to print entire tweets with their retweet count.

We also want to split the text of the tweet in up to 3 lines, if needed.

In [128]:
row_template = "{:^7} | {:^15} | {:50}"
def prettyprint_tweets(list_of_tuples):
    print()
    print(row_template.format("Count", "Screen Name", "Text"))
    print("*"*60)
    for count, screen_name, text in list_of_tuples:
        print(row_template.format(count, screen_name, text[:50]))
        if len(text) > 50:
            print(row_template.format("", "", text[50:100]))
            if len(text) > 100:
                print(row_template.format("", "", text[100:]))

In [131]:
# Slice off the first 5 from the sorted results and display each item in the tuple

prettyprint_tweets(sorted(retweets, reverse=True)[:10])


 Count  |   Screen Name   | Text                                              
************************************************************
  40    |  ShinsukeSada   | RT @ShinsukeSada: DEANさん広島コンサート終了致しました‼️✨✨✨\いやぁ最高に
        |                 | 盛り上がりました😆‼️\広島の皆さまありがとうございます‼️✨✨\\#deanfujioka \#b
        |                 | orntomakehistory \#1stasiatour2019 \#広島…          
  36    |   ryoowatari    | RT @ryoowatari: 4.26金曜日、大型連休の前日、\#高円寺ジロキチ でライヴやります
        |                 | ！初めて名を冠してのライヴ！\良き夜にします！是非っ！\#doasinfinity #doas #大
        |                 | 渡亮\#ryoowatari #guitar #session\#live #…          
  22    |     Song7th     | RT @Song7th: B'zの松本さんに倉木さん✨\\ヒルパンの歴史はスゲーぜ！\\#guita
        |                 | r\#ヒルズパン工場 https://t.co/SwZV4MvQ6B                
  21    |  Gomitakashi53  | RT @Gomitakashi53: 今日は千葉県松戸市でお待ちしてます\#TBOLAN  #ele
        |                 | ctro53 #Guitar #トイプードル #松戸森のホール21 https://t.co/MEe
        |                 | y03sa23                                  