# Task
## Part 1
Write a script that downloads tweets data on a specific search topic using the standard search API. The script should contain the following functions: 
1. scrape_tweets() that has the following parameters:
    1. Search topic
    2. The number of tweets to download per request
    3. The number of requests  
And returns a dataframe.
2. Save_results_as_csv() that has the following parameters:
    1.	the dataframe from the above function  
    And returns a csv file with the following naming format:
    
    *tweets_downloaded_yymmdd_hhmmss.csv (where ‘yymmdd_hhmmss’ is the current 	timestamp)*

The following attributes of the tweets should be extracted:
* Tweet text
* Tweet id
* Source
* Coordinates
* Retweet count
* Likes count
* User info
    - Username
    - Screenname
    - Location
    - Friends count
    - Verification status
    - Description
    - Followers count

Make sure to not include retweets.  
Make sure you the same tweets appearing multiple times in your final csv.

## Part 2
Create a MongoDB database called Tweets_db and store the extracted tweets into a 	collection named: raw_tweets.


Relevant resources:  
Twitter API docs: https://developer.twitter.com/en/docs/twitter-api/v1/tweets/search/api-reference/get-search-tweets  
Tweepy docs: http://docs.tweepy.org/en/latest/api.html  
Installing mongoDB locally: https://docs.mongodb.com/manual/administration/install-community/  
Creating CRUD applications for MongoDB with python: https://www.mongodb.com/blog/post/getting-started-with-python-and-mongodb


## Install Packages

In [1]:
# Package to manipulate env file
!pip install python-decouple

# Tweepy
!pip install tweepy

Collecting python-decouple
  Downloading python-decouple-3.3.tar.gz (10 kB)
Building wheels for collected packages: python-decouple
  Building wheel for python-decouple (setup.py): started
  Building wheel for python-decouple (setup.py): finished with status 'done'
  Created wheel for python-decouple: filename=python_decouple-3.3-py3-none-any.whl size=9030 sha256=a7b56f452a9f0479d99b6dca04fd9b3d28f88e7599e28f0d80af5aaff3f21047
  Stored in directory: c:\users\kingsley\appdata\local\pip\cache\wheels\8a\01\7f\f40899a3f94a9e2307b6bda65b9a513a3cffaa6d3c3b6cf739
Successfully built python-decouple
Installing collected packages: python-decouple
Successfully installed python-decouple-3.3
Collecting tweepy
  Downloading tweepy-3.9.0-py2.py3-none-any.whl (30 kB)
Collecting requests-oauthlib>=0.7.0
  Downloading requests_oauthlib-1.3.0-py2.py3-none-any.whl (23 kB)
Collecting PySocks!=1.5.7,>=1.5.6; extra == "socks"
  Downloading PySocks-1.7.1-py3-none-any.whl (16 kB)
Collecting oauthlib>=3.0.0
  D

## Import Packages

In [1]:
from decouple import config
import tweepy
import pandas as pd

print("Packages imported successfully.")

Packages imported successfully.


## Retrieve API access details from .env

In [2]:
consumer_key = config('API-KEY')
consumer_secret = config('API-SECRET-KEY')
access_token = config('ACCESS-TOKEN')
access_token_secret = config('ACCESS-TOKEN-SECRET')

print("API access details retrieved successfully.")

API access details retrieved successfully.


## Authenticating User

In [3]:
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)

api = tweepy.API(auth, wait_on_rate_limit=True)

if(not api):
    print("Authentication failed!")
    sys.exit(-1)

print("Authentication successful.")

Authentication successful.


## Scrap data and store in Dataframe

In [51]:
def scrape_tweets(api_obj:object, query:str, tweets_per_request:int, max_requests:int):
    tweets_df = pd.DataFrame(columns=['tweet', 'id', 'source', 'coordinates', 'retweetCount', 'likeCount', 'username', 'screenName', 'location', 'friendsCount', 'verificationStatus', 'description', 'followersCount'])
    
    tweets_list = []

    for i in range(0, max_requests):
        response = tweepy.Cursor(api_obj.search, q=query, lang='en', tweet_mode='extended').items(tweets_per_request)
        
        tweets_list = tweets_list + [tweet for tweet in response]

    for tweet in tweets_list:
        if not hasattr(tweet, 'retweeted_status'):
            text = tweet.full_text
            id = tweet.id
            source = tweet.source
            coordinates = tweet.coordinates
            retweetCount = tweet.retweet_count
            likeCount = tweet.user.favourites_count
            username = tweet.user.name
            screenName = tweet.user.screen_name
            location = tweet.user.location
            friends = tweet.user.friends_count
            verification = tweet.user.verified
            description = tweet.user.description
            followers = tweet.user.followers_count

            ith_tweet = [text, id, source, coordinates, retweetCount, likeCount, username, screenName, location, friends, verification, description, followers]

            tweets_df.loc[len(tweets_df)] = ith_tweet

    return tweets_df


In [None]:
query = "Messi"
tweets_no = 15
max_requests = 2

response = scrape_tweets(api, query, tweets_no, max_requests)

print(len(response))

In [50]:
response.head  

AttributeError: 'list' object has no attribute 'head'