# Tweet Extraction using tweepy

**Lets install the tweepy library**

In [None]:
!pip install tweepy

Now we will import the libraries and authorize our connection with the API with the keys we obtained in the last step.

In [None]:
# import tweepy

import tweepy as tw

# your Twitter API key and API secret
my_api_key = "XXXXXXXXXXXXXXXXX"
my_api_secret = "XXXXXXXXXXXXXXXXXXXXXXX"

# authenticate our credentials to login

autho = tw.OAuthHandler(my_api_key, my_api_secret)
api = tw.API(autho, wait_on_rate_limit=True)

Now we will set a search query, this is the hashtag we want to search our tweets from

In [None]:
search_query = "#christmas -filter:retweets"

## Collecting the data


We will use the Tweepy Cursor to fetch the tweets.

It returns an object which can be iterated over to get the API responses. 

We will be fetching 50 tweets for the search query specified above.

In [None]:
# get tweets from the API

tweets = tw.Cursor(api.search,
              q=search_query,
              lang="en",
              since="2020-09-16").items(50)

# store the API responses in a list

tweets_copy = []

for tweet in tweets:
    tweets_copy.append(tweet)
    
print("Total Tweets fetched:", len(tweets_copy))

Here, we pass as an argument the api.search object, the search query, the language of the tweets, and the date from which to search the tweets.
We also limit the number of items (i.e. tweets in this case to 50). The responses are iterated over and saved to the list tweets_copy.

## Creating a Dataset

We will now create a dataset (a pandas dataframe) using the attributes of the tweets received from the API, so that we can use our collected data to use for sentiment analysis.

In [None]:
import pandas as pd

# intialize the dataframe
tweets_df = pd.DataFrame()

# populate the dataframe
for tweet in tweets_copy:
    hashtags = []
    try:
        for hashtag in tweet.entities["hashtags"]:
            hashtags.append(hashtag["text"])
        text = api.get_status(id=tweet.id, tweet_mode='extended').full_text
    except:
        pass
    tweets_df = tweets_df.append(pd.DataFrame({'user_name': tweet.user.name, 
                                               'user_location': tweet.user.location,\
                                               'user_description': tweet.user.description,
                                               'user_verified': tweet.user.verified,
                                               'date': tweet.created_at,
                                               'text': text, 
                                               'hashtags': [hashtags if hashtags else None],
                                               'source': tweet.source}))
    tweets_df = tweets_df.reset_index(drop=True)

# show the dataframe
tweets_df.head()

Here, the dataframe tweets_df is populated with different attributes of the Tweet like the username, user’s location, the user’s description, tweet’s timing, tweet’s text, hashtag, etc.

Also, note that for the tweet’s text we’re not using tweet.text rather we’re calling the API again with the tweet id and fetching its full text. This is because tweet.text does not contain the full text of the Tweet.

Having the data stored as a dataframe is quite useful for further analysis and reference.