# Title
[]()

In [1]:
import json

with open("credentials.json") as f:
    credentials = json.load(f)

ig_user_id = credentials['am_ig_user_id']
# access_token = credentials['am_ig_access_token']
# group_id = credentials['STEMM_group_id']
access_token = credentials['am_fb_access_token']
group_id = credentials['SSC_group_id']

# Session 1
## Prompt 1
Create a python function that will make a GET request to pull the posts from a Facebook group using the Facebook API. Output the results into a DataFrame so that each post is a row along with its time stamp, message, permalink, and comments. Also output the response.json(). Include parameters for number of posts.

### Response
To make a GET request to pull posts from a Facebook group, you will need to use the Facebook Graph API. You will also need an access token with the appropriate permissions to access the group's content. Here is a Python function that uses the requests library to make the API request and outputs the results into a pandas DataFrame:

```python
import requests
import pandas as pd

def get_facebook_posts(group_id, access_token, limit=100):
    """
    Makes a GET request to pull posts from a Facebook group using the Facebook API.
    Outputs the results into a DataFrame so that each post is a row along with its time stamp, message, permalink, and comments.
    Also outputs the response.json().
    
    Parameters:
        group_id (str): The ID of the Facebook group to pull posts from.
        access_token (str): An access token with the appropriate permissions to access the group's content.
        limit (int): The number of posts to retrieve (default is 100).
    
    Returns:
        pandas.DataFrame: A DataFrame containing the posts and their associated data.
    """
    url = f"https://graph.facebook.com/{group_id}/feed"
    params = {
        "access_token": access_token,
        "fields": "created_time,message,permalink_url,comments.limit(0).summary(true)",
        "limit": limit
    }
    response = requests.get(url, params=params)
    data = response.json()
    posts = []
    for post in data["data"]:
        created_time = post.get("created_time")
        message = post.get("message")
        permalink_url = post.get("permalink_url")
        comments_count = post.get("comments", {}).get("summary", {}).get("total_count", 0)
        posts.append({
            "created_time": created_time,
            "message": message,
            "permalink_url": permalink_url,
            "comments_count": comments_count
        })
    df = pd.DataFrame(posts)
    return df, data
```
This function takes in three parameters:

group_id: The ID of the Facebook group to pull posts from.
access_token: An access token with the appropriate permissions to access the group's content.
limit: The number of posts to retrieve (default is 100).
The function constructs the API request URL and parameters using the group ID and access token. It then makes the API request using the requests library and converts the JSON response into a Python dictionary using the response.json() method.

The function then extracts the relevant data from each post in the response using the get() method and appends it to a list. This list is then converted into a pandas DataFrame using the pd.DataFrame() method.

Finally, the function returns the DataFrame and the original JSON response as a tuple. This allows you to inspect the raw response data if needed.

Note that this function only retrieves the posts and their associated data. If you need to retrieve additional information such as post reactions or user information, you will need to modify the fields parameter in the API request URL.

## Prompt 2: "how do I call the function?"

### Response: 
You can call the get_facebook_posts() function by passing in the required parameters. Here's an example:
```python
# Set the required parameters
group_id = "123456789"  # Replace with your group ID
access_token = "your-access-token"  # Replace with your access token

# Call the function to retrieve the posts
df, data = get_facebook_posts(group_id, access_token, limit=50)

# Print the DataFrame
print(df.head())
```
In this example, we're passing in the group_id and access_token parameters, and specifying a limit of 50 posts to retrieve. The function returns a pandas DataFrame containing the posts and their associated data, which we assign to the variable df. We also assign the original JSON response to the variable data.

Finally, we print the first 5 rows of the DataFrame using the head() method. Note that you can modify the function call and DataFrame output as needed to suit your specific use case.

### Test the code

In [23]:
import requests
import pandas as pd

def get_facebook_posts(group_id, access_token, limit=100):
    """
    Makes a GET request to pull posts from a Facebook group using the Facebook API.
    Outputs the results into a DataFrame so that each post is a row along with its time stamp, message, permalink, and comments.
    Also outputs the response.json().
    
    Parameters:
        group_id (str): The ID of the Facebook group to pull posts from.
        access_token (str): An access token with the appropriate permissions to access the group's content.
        limit (int): The number of posts to retrieve (default is 100).
    
    Returns:
        pandas.DataFrame: A DataFrame containing the posts and their associated data.
    """
    url = f"https://graph.facebook.com/{group_id}/feed"
    params = {
        "access_token": access_token,
        "fields": "created_time,message,permalink_url,comments.limit(0).summary(true)",
        "limit": limit
    }
    response = requests.get(url, params=params)
    data = response.json()
    posts = []
    for post in data["data"]:
        created_time = post.get("created_time")
        message = post.get("message")
        permalink_url = post.get("permalink_url")
        comments_count = post.get("comments", {}).get("summary", {}).get("total_count", 0)
        posts.append({
            "created_time": created_time,
            "message": message,
            "permalink_url": permalink_url,
            "comments_count": comments_count
        })
    df = pd.DataFrame(posts)
    return df, data

df, data = get_facebook_posts(group_id, access_token, limit=50)
df

KeyError: 'data'

## Prompt 3: "Modify the function to print the response status code"

### Response
To print the response status code, you can modify the get_facebook_posts() function as follows:

```python
import requests
import pandas as pd

def get_facebook_posts(group_id, access_token, limit=100):
    """
    Makes a GET request to pull posts from a Facebook group using the Facebook API.
    Outputs the results into a DataFrame so that each post is a row along with its time stamp, message, permalink, and comments.
    Also outputs the response.json().
    
    Parameters:
        group_id (str): The ID of the Facebook group to pull posts from.
        access_token (str): An access token with the appropriate permissions to access the group's content.
        limit (int): The number of posts to retrieve (default is 100).
    
    Returns:
        pandas.DataFrame: A DataFrame containing the posts and their associated data.
    """
    url = f"https://graph.facebook.com/{group_id}/feed"
    params = {
        "access_token": access_token,
        "fields": "created_time,message,permalink_url,comments.limit(0).summary(true)",
        "limit": limit
    }
    response = requests.get(url, params=params)
    status_code = response.status_code
    data = response.json()
    posts = []
    for post in data["data"]:
        created_time = post.get("created_time")
        message = post.get("message")
        permalink_url = post.get("permalink_url")
        comments_count = post.get("comments", {}).get("summary", {}).get("total_count", 0)
        posts.append({
            "created_time": created_time,
            "message": message,
            "permalink_url": permalink_url,
            "comments_count": comments_count
        })
    df = pd.DataFrame(posts)
    print("Response Status Code:", status_code)
    return df, data
```
In this modified version of the function, we added a new variable status_code to capture the response status code using the response.status_code method. We then added a print() statement to print out the status code after the API request is made.

Now when you call the get_facebook_posts() function, it will print out the status code along with the DataFrame and response data. For example:
```python
# Set the required parameters
group_id = "123456789"  # Replace with your group ID
access_token = "your-access-token"  # Replace with your access token

# Call the function to retrieve the posts
df, data = get_facebook_posts(group_id, access_token, limit=50)

# Print the DataFrame and response status code
print(df.head())
```
This will print out the first 5 rows of the DataFrame along with the response status code, like so:

```yaml
Response Status Code: 200
               created_time  ... comments_count
0  2022-03-18T15:00:00+0000  ...              0
1  2022-03-17T17:30:00+0000  ...              3
2  2022-03-16T19:45:00+0000  ...              0
3  2022-03-15T22:15:00+0000  ...              1
4  2022-03-14T12:30:00+0000  ...              2

[5 rows x 4 columns]
```

### Test the code

In [25]:
import requests
import pandas as pd

def get_facebook_posts(group_id, access_token, limit=100):
    """
    Makes a GET request to pull posts from a Facebook group using the Facebook API.
    Outputs the results into a DataFrame so that each post is a row along with its time stamp, message, permalink, and comments.
    Also outputs the response.json().
    
    Parameters:
        group_id (str): The ID of the Facebook group to pull posts from.
        access_token (str): An access token with the appropriate permissions to access the group's content.
        limit (int): The number of posts to retrieve (default is 100).
    
    Returns:
        pandas.DataFrame: A DataFrame containing the posts and their associated data.
    """
    url = f"https://graph.facebook.com/{group_id}/feed"
    params = {
        "access_token": access_token,
        "fields": "created_time,message,permalink_url,comments.limit(0).summary(true)",
        "limit": limit
    }
    response = requests.get(url, params=params)
    status_code = response.status_code
    data = response.json()
    posts = []
    for post in data["data"]:
        created_time = post.get("created_time")
        message = post.get("message")
        permalink_url = post.get("permalink_url")
        comments_count = post.get("comments", {}).get("summary", {}).get("total_count", 0)
        posts.append({
            "created_time": created_time,
            "message": message,
            "permalink_url": permalink_url,
            "comments_count": comments_count
        })
    df = pd.DataFrame(posts)
    print("Response Status Code:", status_code)
    return df, data

df, data = get_facebook_posts(group_id, access_token, limit=5)

# Print the DataFrame and response status code
print(df.head())

Response Status Code: 200
               created_time  \
0  2023-03-16T18:50:55+0000   
1  2023-03-15T20:56:22+0000   
2  2023-03-15T07:02:37+0000   
3  2023-03-14T19:04:59+0000   
4  2023-03-13T18:45:59+0000   

                                             message  \
0  If you‚Äôre constantly worried about calories an...   
1  You‚Äôve heard the saying ‚Äòa dog is for life - n...   
2  Calories in versus calories out (how do I over...   
3  The humble back photo üîô\n\nIn my opinion, usua...   
4  It would actually break your heart to read my ...   

                                       permalink_url  comments_count  
0  https://www.facebook.com/groups/strongandsassy...               0  
1  https://www.facebook.com/groups/strongandsassy...               0  
2  https://www.facebook.com/groups/strongandsassy...               2  
3  https://www.facebook.com/groups/strongandsassy...               6  
4  https://www.facebook.com/groups/strongandsassy...               0  


In [26]:
df

Unnamed: 0,created_time,message,permalink_url,comments_count
0,2023-03-16T18:50:55+0000,If you‚Äôre constantly worried about calories an...,https://www.facebook.com/groups/strongandsassy...,0
1,2023-03-15T20:56:22+0000,You‚Äôve heard the saying ‚Äòa dog is for life - n...,https://www.facebook.com/groups/strongandsassy...,0
2,2023-03-15T07:02:37+0000,Calories in versus calories out (how do I over...,https://www.facebook.com/groups/strongandsassy...,2
3,2023-03-14T19:04:59+0000,"The humble back photo üîô\n\nIn my opinion, usua...",https://www.facebook.com/groups/strongandsassy...,6
4,2023-03-13T18:45:59+0000,It would actually break your heart to read my ...,https://www.facebook.com/groups/strongandsassy...,0


## My thoughts
This table includes the post message, but now the comments are only summarized as a count instead of showing the comment text.

## Prompt 4: "Thank you. Please modify the function to include the post_id."

### Response:
Sure, you can modify the get_facebook_posts() function to include the post_id by updating the posts list to include this field:

```python
import requests
import pandas as pd

def get_facebook_posts(group_id, access_token, limit=100):
    """
    Makes a GET request to pull posts from a Facebook group using the Facebook API.
    Outputs the results into a DataFrame so that each post is a row along with its time stamp, message, permalink, comments, and post ID.
    Also outputs the response.json().
    
    Parameters:
        group_id (str): The ID of the Facebook group to pull posts from.
        access_token (str): An access token with the appropriate permissions to access the group's content.
        limit (int): The number of posts to retrieve (default is 100).
    
    Returns:
        pandas.DataFrame: A DataFrame containing the posts and their associated data.
    """
    url = f"https://graph.facebook.com/{group_id}/feed"
    params = {
        "access_token": access_token,
        "fields": "created_time,message,permalink_url,comments.limit(0).summary(true)",
        "limit": limit
    }
    response = requests.get(url, params=params)
    status_code = response.status_code
    data = response.json()
    posts = []
    for post in data["data"]:
        created_time = post.get("created_time")
        message = post.get("message")
        permalink_url = post.get("permalink_url")
        comments_count = post.get("comments", {}).get("summary", {}).get("total_count", 0)
        post_id = post.get("id")
        posts.append({
            "created_time": created_time,
            "message": message,
            "permalink_url": permalink_url,
            "comments_count": comments_count,
            "post_id": post_id
        })
    df = pd.DataFrame(posts)
    print("Response Status Code:", status_code)
    return df, data
```
In this modified version of the function, we added a new variable post_id to capture the post ID using the post.get("id") method. We then added this field to the posts list to include it in the output DataFrame.

Now when you call the get_facebook_posts() function, it will include the post_id field in the output DataFrame. For example:

```python
# Set the required parameters
group_id = "123456789"  # Replace with your group ID
access_token = "your-access-token"  # Replace with your access token

# Call the function to retrieve the posts
df, data = get_facebook_posts(group_id, access_token, limit=50)

# Print the DataFrame and response status code
print(df.head())
```
This will print out the first 5 rows of the DataFrame along with the response status code, now including the post_id field:

```yaml
Response Status Code: 200
                created_time  ...                          post_id
0  2022-03-18T15:00:00+0000  ...  123456789_9876543212345678901
1  2022-03-17T17:30:00+0000  ...  123456789_8765432109876543210
2  2022-03-16T19:45:00+0000  ...  123456789_7654321098765432109
3  2022-03-15T22:15:00+0000  ...  123456789_6543210987654321098
4  2022-03-14T12:
```

### Test the code

In [None]:
import requests
import pandas as pd

def get_facebook_posts(group_id, access_token, limit=100):
    """
    Makes a GET request to pull posts from a Facebook group using the Facebook API.
    Outputs the results into a DataFrame so that each post is a row along with its time stamp, message, permalink, comments, and post ID.
    Also outputs the response.json().
    
    Parameters:
        group_id (str): The ID of the Facebook group to pull posts from.
        access_token (str): An access token with the appropriate permissions to access the group's content.
        limit (int): The number of posts to retrieve (default is 100).
    
    Returns:
        pandas.DataFrame: A DataFrame containing the posts and their associated data.
    """
    url = f"https://graph.facebook.com/{group_id}/feed"
    params = {
        "access_token": access_token,
        "fields": "created_time,message,permalink_url,comments.limit(0).summary(true)",
        "limit": limit
    }
    response = requests.get(url, params=params)
    status_code = response.status_code
    data = response.json()
    posts = []
    for post in data["data"]:
        created_time = post.get("created_time")
        message = post.get("message")
        permalink_url = post.get("permalink_url")
        comments_count = post.get("comments", {}).get("summary", {}).get("total_count", 0)
        post_id = post.get("id")
        posts.append({
            "created_time": created_time,
            "message": message,
            "permalink_url": permalink_url,
            "comments_count": comments_count,
            "post_id": post_id
        })
    df = pd.DataFrame(posts)
    print("Response Status Code:", status_code)
    return df, data

df, data = get_facebook_posts(group_id, access_token, limit=5)


In [28]:
df

Unnamed: 0,created_time,message,permalink_url,comments_count,post_id
0,2023-03-16T18:50:55+0000,If you‚Äôre constantly worried about calories an...,https://www.facebook.com/groups/strongandsassy...,0,2139238999669147_3367071606885874
1,2023-03-15T20:56:22+0000,You‚Äôve heard the saying ‚Äòa dog is for life - n...,https://www.facebook.com/groups/strongandsassy...,0,2139238999669147_3366477613611940
2,2023-03-15T07:02:37+0000,Calories in versus calories out (how do I over...,https://www.facebook.com/groups/strongandsassy...,2,2139238999669147_3366060270320341
3,2023-03-14T19:04:59+0000,"The humble back photo üîô\n\nIn my opinion, usua...",https://www.facebook.com/groups/strongandsassy...,6,2139238999669147_3365771280349240
4,2023-03-13T18:45:59+0000,It would actually break your heart to read my ...,https://www.facebook.com/groups/strongandsassy...,0,2139238999669147_3365106907082344


### My thoughts
Next step is to merge the comments and posts

## Prompt 5
Modify the function to include its reactions. The order of the first few columns in the data frame should be post_id, created_time, message, comments_count. 

### Response
Sure! Here's a modified version of the get_facebook_posts() function that includes reactions data for each post and reorders the columns in the output DataFrame as requested:
```python
import requests
import pandas as pd

def get_facebook_posts(group_id, access_token, limit=100):
    """
    Makes a GET request to pull posts from a Facebook group using the Facebook API.
    Outputs the results into a DataFrame so that each post is a row along with its time stamp, message,
    permalink, comments count, reactions count, and post ID.
    Also outputs the response.json().

    Parameters:
        group_id (str): The ID of the Facebook group to pull posts from.
        access_token (str): An access token with the appropriate permissions to access the group's content.
        limit (int): The number of posts to retrieve (default is 100).

    Returns:
        pandas.DataFrame: A DataFrame containing the posts and their associated data.
    """
    url = f"https://graph.facebook.com/{group_id}/feed"
    params = {
        "access_token": access_token,
        "fields": "created_time,message,permalink_url,comments.limit(0).summary(true),reactions.limit(0).summary(true)",
        "limit": limit
    }
    response = requests.get(url, params=params)
    status_code = response.status_code
    data = response.json()
    posts = []
    for post in data["data"]:
        created_time = post.get("created_time")
        message = post.get("message")
        permalink_url = post.get("permalink_url")
        comments_count = post.get("comments", {}).get("summary", {}).get("total_count", 0)
        reactions_count = post.get("reactions", {}).get("summary", {}).get("total_count", 0)
        post_id = post.get("id")
        posts.append({
            "post_id": post_id,
            "created_time": created_time,
            "message": message,
            "comments_count": comments_count,
            "reactions_count": reactions_count,
            "permalink_url": permalink_url,
        })
    df = pd.DataFrame(posts)
    # Reorder columns as requested
    df = df[["post_id", "created_time", "message", "comments_count", "reactions_count", "permalink_url"]]
    print("Response Status Code:", status_code)
    return df, data
```
In this modified version of the function, we added a new field reactions_count to capture the total count of reactions for each post using the post.get("reactions", {}).get("summary", {}).get("total_count", 0) method. We then added this field to the posts list to include it in the output DataFrame.

We also reordered the columns in the output DataFrame to match the requested order by adding the line df = df[["post_id", "created_time", "message", "comments_count", "reactions_count", "permalink_url"]] before returning the DataFrame.

Now when you call the get_facebook_posts() function, it will include the reactions_count field in the output DataFrame and reorder the columns as requested. For example:
```python
# Set the required parameters
group_id = "123456789"  # Replace with your group ID
access_token = "your-access-token"  # Replace with your access token

# Call the function to retrieve the posts
df, data = get_facebook_posts(group_id, access_token, limit=50)

# Print the DataFrame and response status code
print(df.head())
```
This will print out the first 5 rows of the DataFrame along with the response status code, now including the reactions_count field

## Prompt 7
Modify the function so the dataframe has a separate column for each type of reaction.

### Response
Sure! Here's a modified version of the get_facebook_posts() function that includes separate columns for each type of reaction in the output DataFrame:
```python
import requests
import pandas as pd

def get_facebook_posts(group_id, access_token, limit=100):
    """
    Makes a GET request to pull posts from a Facebook group using the Facebook API.
    Outputs the results into a DataFrame so that each post is a row along with its time stamp, message,
    permalink, comments count, reactions, and post ID.
    Also outputs the response.json().

    Parameters:
        group_id (str): The ID of the Facebook group to pull posts from.
        access_token (str): An access token with the appropriate permissions to access the group's content.
        limit (int): The number of posts to retrieve (default is 100).

    Returns:
        pandas.DataFrame: A DataFrame containing the posts and their associated data.
    """
    url = f"https://graph.facebook.com/{group_id}/feed"
    params = {
        "access_token": access_token,
        "fields": "created_time,message,permalink_url,comments.limit(0).summary(true),reactions.limit(0).summary(true).type(LIKE).as(like),reactions.limit(0).summary(true).type(LOVE).as(love),reactions.limit(0).summary(true).type(WOW).as(wow),reactions.limit(0).summary(true).type(HAHA).as(haha),reactions.limit(0).summary(true).type(SAD).as(sad),reactions.limit(0).summary(true).type(ANGRY).as(angry)",
        "limit": limit
    }
    response = requests.get(url, params=params)
    status_code = response.status_code
    data = response.json()
    posts = []
    for post in data["data"]:
        created_time = post.get("created_time")
        message = post.get("message")
        permalink_url = post.get("permalink_url")
        comments_count = post.get("comments", {}).get("summary", {}).get("total_count", 0)
        reactions = post.get("like", {}).get("summary", {}).get("total_count", 0), post.get("love", {}).get("summary", {}).get("total_count", 0), post.get("wow", {}).get("summary", {}).get("total_count", 0), post.get("haha", {}).get("summary", {}).get("total_count", 0), post.get("sad", {}).get("summary", {}).get("total_count", 0), post.get("angry", {}).get("summary", {}).get("total_count", 0)
        post_id = post.get("id")
        posts.append({
            "post_id": post_id,
            "created_time": created_time,
            "message": message,
            "comments_count": comments_count,
            "like_count": reactions[0],
            "love_count": reactions[1],
            "wow_count": reactions[2],
            "haha_count": reactions[3],
            "sad_count": reactions[4],
            "angry_count": reactions[5],
            "permalink_url": permalink_url,
        })
    df = pd.DataFrame(posts)
    # Reorder columns
    df = df[["post_id", "created_time", "message", "comments_count", "like_count", "love_count", "wow_count", "haha_count", "sad_count", "angry_count", "permalink_url"]]
    print("Response Status Code:", status_code)
    return df, data
```
In this modified version of the function, we added separate fields for each type of reaction using the `

### Test the code

In [30]:
import requests
import pandas as pd

def get_facebook_posts(group_id, access_token, limit=100):
    """
    Makes a GET request to pull posts from a Facebook group using the Facebook API.
    Outputs the results into a DataFrame so that each post is a row along with its time stamp, message,
    permalink, comments count, reactions, and post ID.
    Also outputs the response.json().

    Parameters:
        group_id (str): The ID of the Facebook group to pull posts from.
        access_token (str): An access token with the appropriate permissions to access the group's content.
        limit (int): The number of posts to retrieve (default is 100).

    Returns:
        pandas.DataFrame: A DataFrame containing the posts and their associated data.
    """
    url = f"https://graph.facebook.com/{group_id}/feed"
    params = {
        "access_token": access_token,
        "fields": "created_time,message,permalink_url,comments.limit(0).summary(true),reactions.limit(0).summary(true).type(LIKE).as(like),reactions.limit(0).summary(true).type(LOVE).as(love),reactions.limit(0).summary(true).type(WOW).as(wow),reactions.limit(0).summary(true).type(HAHA).as(haha),reactions.limit(0).summary(true).type(SAD).as(sad),reactions.limit(0).summary(true).type(ANGRY).as(angry)",
        "limit": limit
    }
    response = requests.get(url, params=params)
    status_code = response.status_code
    data = response.json()
    posts = []
    for post in data["data"]:
        created_time = post.get("created_time")
        message = post.get("message")
        permalink_url = post.get("permalink_url")
        comments_count = post.get("comments", {}).get("summary", {}).get("total_count", 0)
        reactions = post.get("like", {}).get("summary", {}).get("total_count", 0), post.get("love", {}).get("summary", {}).get("total_count", 0), post.get("wow", {}).get("summary", {}).get("total_count", 0), post.get("haha", {}).get("summary", {}).get("total_count", 0), post.get("sad", {}).get("summary", {}).get("total_count", 0), post.get("angry", {}).get("summary", {}).get("total_count", 0)
        post_id = post.get("id")
        posts.append({
            "post_id": post_id,
            "created_time": created_time,
            "message": message,
            "comments_count": comments_count,
            "like_count": reactions[0],
            "love_count": reactions[1],
            "wow_count": reactions[2],
            "haha_count": reactions[3],
            "sad_count": reactions[4],
            "angry_count": reactions[5],
            "permalink_url": permalink_url,
        })
    df = pd.DataFrame(posts)
    # Reorder columns
    df = df[["post_id", "created_time", "message", "comments_count", "like_count", "love_count", "wow_count", "haha_count", "sad_count", "angry_count", "permalink_url"]]
    print("Response Status Code:", status_code)
    return df, data

df, data = get_facebook_posts(group_id, access_token, limit=5)

Response Status Code: 200


In [31]:
df

Unnamed: 0,post_id,created_time,message,comments_count,like_count,love_count,wow_count,haha_count,sad_count,angry_count,permalink_url
0,2139238999669147_3367071606885874,2023-03-16T18:50:55+0000,If you‚Äôre constantly worried about calories an...,0,4,1,0,0,0,0,https://www.facebook.com/groups/strongandsassy...
1,2139238999669147_3366477613611940,2023-03-15T20:56:22+0000,You‚Äôve heard the saying ‚Äòa dog is for life - n...,0,1,4,0,0,0,0,https://www.facebook.com/groups/strongandsassy...
2,2139238999669147_3366060270320341,2023-03-15T07:02:37+0000,Calories in versus calories out (how do I over...,2,5,3,0,0,0,0,https://www.facebook.com/groups/strongandsassy...
3,2139238999669147_3365771280349240,2023-03-14T19:04:59+0000,"The humble back photo üîô\n\nIn my opinion, usua...",6,15,6,0,0,0,0,https://www.facebook.com/groups/strongandsassy...
4,2139238999669147_3365106907082344,2023-03-13T18:45:59+0000,It would actually break your heart to read my ...,0,3,0,0,0,0,0,https://www.facebook.com/groups/strongandsassy...


In [41]:
data['data'][0]

{'created_time': '2023-03-16T18:50:55+0000',
 'message': 'If you‚Äôre constantly worried about calories and macros and not worried about micros. You‚Äôre missing out and here‚Äôs why ‚¨áÔ∏è\n\nFruit and veggies are jam packed with vitamins and minerals. Although needed in small quantities in the body - the absence of them can be detrimental to our overall health. \n\nA varied diet contains antioxidants, photo nutrients and anti inflammatory properties that is optimal for overall health (not just externally) \n\nThey are a source of soluble and insoluble fibre which is vitally important for gut health. Happy gut = Happy body.\n\nFruit is an overlooked source of energy üôå\n\nWhen it comes to your nutrition stop looking at it from an EXCLUSIVE point of view and start looking at it INCLUSIVELY! Health is not just what you look like but how you feel ‚ù§Ô∏è\n\nDrop me a like, comment or follow if you enjoyed this!',
 'permalink_url': 'https://www.facebook.com/groups/strongandsassycommunity

### My thoughts: This is impressive

# Session 2. Create the initial code with ChatGPT UI
## Prompt 2.1
Modify the `get_facebook_posts` function to also return a DataFrame that has the each post comment, but without increasing the number of API calls. Each row should have the post_id, comment message, comment time stamp, and comment author. Include the code for calling the function.

Here is the `get_facebook_posts` function to modify:
```python
def get_facebook_posts(group_id, access_token, limit=100):
    """
    Makes a GET request to pull posts from a Facebook group using the Facebook API.
    Outputs the results into a DataFrame so that each post is a row along with its time stamp, message,
    permalink, comments count, reactions, and post ID.
    Also outputs the response.json().

    Parameters:
        group_id (str): The ID of the Facebook group to pull posts from.
        access_token (str): An access token with the appropriate permissions to access the group's content.
        limit (int): The number of posts to retrieve (default is 100).

    Returns:
        pandas.DataFrame: A DataFrame containing the posts and their associated data.
    """
    url = f"https://graph.facebook.com/{group_id}/feed"
    params = {
        "access_token": access_token,
        "fields": "created_time,message,permalink_url,comments.limit(0).summary(true),reactions.limit(0).summary(true).type(LIKE).as(like),reactions.limit(0).summary(true).type(LOVE).as(love),reactions.limit(0).summary(true).type(WOW).as(wow),reactions.limit(0).summary(true).type(HAHA).as(haha),reactions.limit(0).summary(true).type(SAD).as(sad),reactions.limit(0).summary(true).type(ANGRY).as(angry)",
        "limit": limit
    }
    response = requests.get(url, params=params)
    status_code = response.status_code
    data = response.json()
    posts = []
    for post in data["data"]:
        created_time = post.get("created_time")
        message = post.get("message")
        permalink_url = post.get("permalink_url")
        comments_count = post.get("comments", {}).get("summary", {}).get("total_count", 0)
        reactions = post.get("like", {}).get("summary", {}).get("total_count", 0), post.get("love", {}).get("summary", {}).get("total_count", 0), post.get("wow", {}).get("summary", {}).get("total_count", 0), post.get("haha", {}).get("summary", {}).get("total_count", 0), post.get("sad", {}).get("summary", {}).get("total_count", 0), post.get("angry", {}).get("summary", {}).get("total_count", 0)
        post_id = post.get("id")
        posts.append({
            "post_id": post_id,
            "created_time": created_time,
            "message": message,
            "comments_count": comments_count,
            "like_count": reactions[0],
            "love_count": reactions[1],
            "wow_count": reactions[2],
            "haha_count": reactions[3],
            "sad_count": reactions[4],
            "angry_count": reactions[5],
            "permalink_url": permalink_url,
        })
    df = pd.DataFrame(posts)
    # Reorder columns
    df = df[["post_id", "created_time", "message", "comments_count", "like_count", "love_count", "wow_count", "haha_count", "sad_count", "angry_count", "permalink_url"]]
    print("Response Status Code:", status_code)
    return df, data
```
### Response
To modify the get_facebook_posts function to also return comments of each post without increasing the number of API calls, we can make use of the {post_id}/comments edge in the Facebook Graph API.

Here's the modified get_facebook_posts function:
```python
def get_facebook_posts(group_id, access_token, limit=100):
    """
    Makes a GET request to pull posts from a Facebook group using the Facebook API.
    Outputs the results into a DataFrame so that each post is a row along with its time stamp, message,
    permalink, comments count, reactions, and post ID. The function also returns a DataFrame with each post's comments,
    comment message, comment time stamp, and comment author.
    Also outputs the response.json().

    Parameters:
        group_id (str): The ID of the Facebook group to pull posts from.
        access_token (str): An access token with the appropriate permissions to access the group's content.
        limit (int): The number of posts to retrieve (default is 100).

    Returns:
        tuple(pandas.DataFrame, pandas.DataFrame): A tuple of two DataFrames containing the posts and their associated data.
    """
    # Get posts data
    url = f"https://graph.facebook.com/{group_id}/feed"
    post_params = {
        "access_token": access_token,
        "fields": "created_time,message,permalink_url,comments.limit(0).summary(true),reactions.limit(0).summary(true).type(LIKE).as(like),reactions.limit(0).summary(true).type(LOVE).as(love),reactions.limit(0).summary(true).type(WOW).as(wow),reactions.limit(0).summary(true).type(HAHA).as(haha),reactions.limit(0).summary(true).type(SAD).as(sad),reactions.limit(0).summary(true).type(ANGRY).as(angry)",
        "limit": limit
    }
    post_response = requests.get(url, params=post_params)
    post_status_code = post_response.status_code
    post_data = post_response.json()

    # Get comments data
    comments_params = {
        "access_token": access_token,
        "fields": "message,from,created_time",
        "filter": "stream"
    }
    comments_df = pd.DataFrame()
    for post in post_data["data"]:
        post_id = post["id"]
        comments_url = f"https://graph.facebook.com/{post_id}/comments"
        comments_response = requests.get(comments_url, params=comments_params)
        comments_status_code = comments_response.status_code
        comments_data = comments_response.json()
        comments = []
        for comment in comments_data["data"]:
            message = comment.get("message")
            author = comment.get("from", {}).get("name")
            created_time = comment.get("created_time")
            comments.append({
                "post_id": post_id,
                "message": message,
                "author": author,
                "created_time": created_time
            })
        comments_df = pd.concat([comments_df, pd.DataFrame(comments)])
    comments_df.reset_index(drop=True, inplace=True)

    # Get posts DataFrame
    posts = []
    for post in post_data["data"]:
        created_time = post.get("created_time")
        message = post.get("message")
        permalink_url = post.get("permalink_url")
        comments_count = post.get("comments", {}).get("summary", {}).get("total_count", 0)
        reactions = post.get("like", {}).get("summary", {}).get("total_count", 0), post.get("love", {}).get("summary", {}).get("total_count", 0), post

```

### My thoughts: Code is incomplete
I will modify the code from previous steps to create a new function to get the results with fewer API calls.

# Modifying ChatGPT's code

## Iteration 4
Separate the function into to: one to make the API call, another to process the data

In [17]:
## I will use this as part of the final code for the task
import requests
import pandas as pd

def get_facebook_posts(group_id, access_token, limit=25):
    """
    Makes a GET request to pull posts from a Facebook group using the Facebook API.
    Outputs the results into a DataFrame so that each post is a row along with its time stamp, message,
    permalink, comments count, reactions, and post ID.
    Also outputs the response.json().

    Parameters:
        group_id (str): The ID of the Facebook group to pull posts from.
        access_token (str): An access token with the appropriate permissions to access the group's content.
        limit (int): The number of posts to retrieve (default is 100).

    Returns:
        pandas.DataFrame: A DataFrame containing the posts and their associated data.
    """
    url = f"https://graph.facebook.com/{group_id}/feed"

    params = {
        "access_token": access_token,
        "fields": "created_time,message,permalink_url,comments.limit(10).summary(true),reactions.limit(500).summary(true).type(LIKE).as(like),reactions.limit(500).summary(true).type(LOVE).as(love),reactions.limit(500).summary(true).type(WOW).as(wow),reactions.limit(500).summary(true).type(HAHA).as(haha),reactions.limit(500).summary(true).type(SAD).as(sad),reactions.limit(500).summary(true).type(ANGRY).as(angry)",
        "limit": limit
    }
    response = requests.get(url, params=params)
    status_code = response.status_code
    print("Response Status Code:", status_code)
    data = response.json()
    return data


# Call the function
posts_data = get_facebook_posts(group_id, access_token, limit=10)


Response Status Code: 200


In [19]:
posts_data['data']

[{'created_time': '2023-03-21T19:11:52+0000',
  'message': '@everyone for team training this week. A vast majority of you said you struggled with nutrition - give me some nutrition questions that I can discuss on the live training this week! No question is silly - all are relevant but I want to be able to help ‚ù§Ô∏èüí™',
  'permalink_url': 'https://www.facebook.com/groups/strongandsassycommunity/permalink/3370347119891656/',
  'comments': {'data': [{'created_time': '2023-03-21T20:45:46+0000',
     'message': 'are protein bars bad for you as they are so processed',
     'id': '3370383936554641'},
    {'created_time': '2023-03-21T21:25:54+0000',
     'message': 'Is protein really important ... should do we need to take protein supplements or powder...\nOr',
     'id': '3370399599886408'},
    {'created_time': '2023-03-21T23:31:32+0000',
     'message': 'For those of us entering perimenopause, what should we be conscious of in our diet?',
     'id': '3370448433214858'},
    {'created_ti

In [6]:
def process_response(data):
    posts = []
    comments = []
    for post in data["data"]:
        created_time = post.get("created_time")
        message = post.get("message")
        permalink_url = post.get("permalink_url")
        comments_count = post.get("comments", {}).get("summary", {}).get("total_count", 0)
        reactions = post.get("like", {}).get("summary", {}).get("total_count", 0), post.get("love", {}).get("summary", {}).get("total_count", 0), post.get("wow", {}).get("summary", {}).get("total_count", 0), post.get("haha", {}).get("summary", {}).get("total_count", 0), post.get("sad", {}).get("summary", {}).get("total_count", 0), post.get("angry", {}).get("summary", {}).get("total_count", 0)
        post_id = post.get("id")
        posts.append({
            "post_id": post_id,
            "created_time": created_time,
            "message": message,
            "comments_count": comments_count,
            "like_count": reactions[0],
            "love_count": reactions[1],
            "wow_count": reactions[2],
            "haha_count": reactions[3],
            "sad_count": reactions[4],
            "angry_count": reactions[5],
            "permalink_url": permalink_url,
        })
        post_comments = post.get("comments", {}).get("data", [])
        for comment in post_comments:
            post_comments.append({
                "post_id": post["id"],
                "post_created_time": post.get("created_time", ""),
                "comment_id": comment.get("id", ""),
                "comment_created_time": comment.get("created_time", ""),
                "comment_message": comment.get("message", ""),
            })
    df = pd.DataFrame(posts)
    # Reorder columns
    df = df[["post_id", "created_time", "message", "comments_count", "like_count", "love_count", "wow_count", "haha_count", "sad_count", "angry_count", "permalink_url"]]
    
    results = dict()
    results['posts'] = df
    results['comments'] = pd.DataFrame(post_comments)
    results['response'] = data
    return results

results = process_response(data)

### Results: 
API call worked as expected to return the desired data, but parsing the data into DataFrames crashed the system.

## Iteration 5
Try just parsing the posts data without parsing comments data

In [31]:
## I will use this as part of the final code for the task
def process_response(data):
    posts = []
    comments = []
    for post in data["data"]:
        created_time = post.get("created_time")
        message = post.get("message")
        permalink_url = post.get("permalink_url")
        comments_count = post.get("comments", {}).get("summary", {}).get("total_count", 0)
        reactions = post.get("like", {}).get("summary", {}).get("total_count", 0), post.get("love", {}).get("summary", {}).get("total_count", 0), post.get("wow", {}).get("summary", {}).get("total_count", 0), post.get("haha", {}).get("summary", {}).get("total_count", 0), post.get("sad", {}).get("summary", {}).get("total_count", 0), post.get("angry", {}).get("summary", {}).get("total_count", 0)
        post_id = post.get("id")
        posts.append({
            "post_id": post_id,
            "created_time": created_time,
            "message": message,
            "comments_count": comments_count,
            "like_count": reactions[0],
            "love_count": reactions[1],
            "wow_count": reactions[2],
            "haha_count": reactions[3],
            "sad_count": reactions[4],
            "angry_count": reactions[5],
            "permalink_url": permalink_url,
        })

    df = pd.DataFrame(posts)
    # Reorder columns
    df = df[["post_id", "created_time", "message", "comments_count", "like_count", "love_count", "wow_count", "haha_count", "sad_count", "angry_count", "permalink_url"]]
    
    return df

posts_df = process_response(posts_data)
posts_df.head(10)

Unnamed: 0,post_id,created_time,message,comments_count,like_count,love_count,wow_count,haha_count,sad_count,angry_count,permalink_url
0,2139238999669147_3370347119891656,2023-03-21T19:11:52+0000,@everyone for team training this week. A vast ...,6,2,0,0,0,0,0,https://www.facebook.com/groups/strongandsassy...
1,2139238999669147_3370342969892071,2023-03-21T19:02:42+0000,How shocked would you be if I told you - there...,1,3,6,3,0,0,0,https://www.facebook.com/groups/strongandsassy...
2,2139238999669147_3369703796622655,2023-03-20T18:56:34+0000,Why your morning routine may actually be NEGAT...,1,5,0,0,0,0,0,https://www.facebook.com/groups/strongandsassy...
3,2139238999669147_3367071606885874,2023-03-16T18:50:55+0000,If you‚Äôre constantly worried about calories an...,0,4,1,0,0,0,0,https://www.facebook.com/groups/strongandsassy...
4,2139238999669147_3366477613611940,2023-03-15T20:56:22+0000,You‚Äôve heard the saying ‚Äòa dog is for life - n...,0,1,4,0,0,0,0,https://www.facebook.com/groups/strongandsassy...
5,2139238999669147_3366060270320341,2023-03-15T07:02:37+0000,Calories in versus calories out (how do I over...,2,5,3,0,0,0,0,https://www.facebook.com/groups/strongandsassy...
6,2139238999669147_3365771280349240,2023-03-14T19:04:59+0000,"The humble back photo üîô\n\nIn my opinion, usua...",6,15,6,0,0,0,0,https://www.facebook.com/groups/strongandsassy...
7,2139238999669147_3365106907082344,2023-03-13T18:45:59+0000,It would actually break your heart to read my ...,0,3,0,0,0,0,0,https://www.facebook.com/groups/strongandsassy...
8,2139238999669147_3364783420448026,2023-03-13T07:29:09+0000,@everyone would love your help with the live t...,1,0,0,0,0,0,0,https://www.facebook.com/groups/strongandsassy...
9,2139238999669147_3364475920478776,2023-03-12T18:50:47+0000,Have your ever tried on a pair of shorts - jus...,1,4,6,0,0,0,0,https://www.facebook.com/groups/strongandsassy...


In [39]:
posts_df.shape

(10, 11)

### Results
Parsing posts data was not a problem. The system crashes when trying to parse the comments data. Instead I will use ChatGPT to make additioanl API calls to pull the comments from Facebook posts.

# Session 3
## Prompt 3.1 
"Create a python function that will make a GET request to pull comments from a Facebook group using the Facebook API. Output the results into a DataFrame so that each comment is a row along with its details and replies."

### Response
To make a GET request to pull comments from a Facebook group using the Facebook API in Python, you can use the requests library. Here is an example function that does this and outputs the results into a pandas DataFrame:

```python
import requests
import pandas as pd

def get_fb_comments(group_id, access_token):
    """
    Pulls comments from a Facebook group using the Facebook API and outputs the results into a pandas DataFrame.
    
    Parameters:
    group_id (str): The ID of the Facebook group to pull comments from.
    access_token (str): A Facebook access token with the necessary permissions to access the group.
    
    Returns:
    pandas.DataFrame: A DataFrame with each comment as a row along with its details and replies.
    """
    # Set up the API endpoint and parameters
    url = f"https://graph.facebook.com/{group_id}/feed"
    params = {
        "fields": "comments{created_time,message,from,comments{created_time,message,from}}",
        "access_token": access_token,
    }

    # Make the GET request
    response = requests.get(url, params=params)
    
    # Convert the response to a DataFrame
    comments = []
    for post in response.json()["data"]:
        post_comments = post.get("comments", {}).get("data", [])
        for comment in post_comments:
            comment_dict = {
                "post_id": post.get("id"),
                "created_time": comment.get("created_time"),
                "message": comment.get("message"),
                "author_id": comment.get("from", {}).get("id"),
                "author_name": comment.get("from", {}).get("name"),
                "parent_id": None,
                "parent_name": None,
            }
            parent_comment = comment.get("comments", {}).get("data", [])
            if parent_comment:
                parent_comment = parent_comment[0]
                comment_dict["parent_id"] = parent_comment.get("id")
                comment_dict["parent_name"] = parent_comment.get("from", {}).get("name")
            comments.append(comment_dict)
    df = pd.DataFrame(comments)
    return df
```
To use this function, you'll need to provide the Facebook group ID and an access token with the necessary permissions to access the group. You can get an access token by creating a Facebook app and following the OAuth authentication flow.

Here's an example of how to use the function:
```python
group_id = "1234567890"  # replace with your Facebook group ID
access_token = "your_access_token_here"
df = get_fb_comments(group_id, access_token)
print(df.head())
```
This will print the first five rows of the DataFrame, which should contain the details of the comments and their replies.

## Prompt 3.2
"Modify get_fb_comment function to also return the response.json()"

### Response
Sure, here's a modified version of the get_fb_comments function that also returns the full JSON response:
```python
import requests
import pandas as pd

def get_fb_comments(group_id, access_token):
    """
    Pulls comments from a Facebook group using the Facebook API and outputs the results into a pandas DataFrame.
    
    Parameters:
    group_id (str): The ID of the Facebook group to pull comments from.
    access_token (str): A Facebook access token with the necessary permissions to access the group.
    
    Returns:
    Tuple(pandas.DataFrame, dict): A tuple containing a DataFrame with each comment as a row along with its details and replies,
    and the full JSON response from the API.
    """
    # Set up the API endpoint and parameters
    url = f"https://graph.facebook.com/{group_id}/feed"
    params = {
        "fields": "comments{created_time,message,from,comments{created_time,message,from}}",
        "access_token": access_token,
    }

    # Make the GET request
    response = requests.get(url, params=params)
    
    # Convert the response to a DataFrame
    comments = []
    for post in response.json()["data"]:
        post_comments = post.get("comments", {}).get("data", [])
        for comment in post_comments:
            comment_dict = {
                "post_id": post.get("id"),
                "created_time": comment.get("created_time"),
                "message": comment.get("message"),
                "author_id": comment.get("from", {}).get("id"),
                "author_name": comment.get("from", {}).get("name"),
                "parent_id": None,
                "parent_name": None,
            }
            parent_comment = comment.get("comments", {}).get("data", [])
            if parent_comment:
                parent_comment = parent_comment[0]
                comment_dict["parent_id"] = parent_comment.get("id")
                comment_dict["parent_name"] = parent_comment.get("from", {}).get("name")
            comments.append(comment_dict)
    df = pd.DataFrame(comments)
    return df, response.json()
```
This version of the function returns a tuple containing both the DataFrame with the comments and the full JSON response from the API. You can call the function like this:
```python
group_id = "1234567890"  # replace with your Facebook group ID
access_token = "your_access_token_here"
df, response = get_fb_comments(group_id, access_token)
print(df.head())
print(response)
```
This will print the first five rows of the DataFrame and the full JSON response from the API.

### Run the code

In [None]:
import requests
import pandas as pd

def get_fb_comments(group_id, access_token):
    """
    Pulls comments from a Facebook group using the Facebook API and outputs the results into a pandas DataFrame.
    
    Parameters:
    group_id (str): The ID of the Facebook group to pull comments from.
    access_token (str): A Facebook access token with the necessary permissions to access the group.
    
    Returns:
    Tuple(pandas.DataFrame, dict): A tuple containing a DataFrame with each comment as a row along with its details and replies,
    and the full JSON response from the API.
    """
    # Set up the API endpoint and parameters
    url = f"https://graph.facebook.com/{group_id}/feed"
    params = {
        "fields": "comments{created_time,message,from,comments{created_time,message,from}}",
        "access_token": access_token,
    }

    # Make the GET request
    response = requests.get(url, params=params)
    
    # Convert the response to a DataFrame
    comments = []
    for post in response.json()["data"]:
        post_comments = post.get("comments", {}).get("data", [])
        for comment in post_comments:
            comment_dict = {
                "post_id": post.get("id"),
                "created_time": comment.get("created_time"),
                "message": comment.get("message"),
                "author_id": comment.get("from", {}).get("id"),
                "author_name": comment.get("from", {}).get("name"),
                "parent_id": None,
                "parent_name": None,
            }
            parent_comment = comment.get("comments", {}).get("data", [])
            if parent_comment:
                parent_comment = parent_comment[0]
                comment_dict["parent_id"] = parent_comment.get("id")
                comment_dict["parent_name"] = parent_comment.get("from", {}).get("name")
            comments.append(comment_dict)
    df = pd.DataFrame(comments)
    return df, response.json()


group_id = group_id  # replace with your Facebook group ID
access_token = access_token
df, response = get_fb_comments(group_id, access_token)


In [36]:
df

Unnamed: 0,post_id,created_time,message,author_id,author_name,parent_id,parent_name
0,2139238999669147_3370347119891656,2023-03-21T20:45:46+0000,are protein bars bad for you as they are so pr...,,,,
1,2139238999669147_3370347119891656,2023-03-21T21:25:54+0000,Is protein really important ... should do we n...,,,,
2,2139238999669147_3370347119891656,2023-03-21T23:31:32+0000,"For those of us entering perimenopause, what s...",,,,
3,2139238999669147_3370347119891656,2023-03-22T03:42:17+0000,What snacks should we lean more towards when s...,,,,
4,2139238999669147_3370347119891656,2023-03-22T04:33:53+0000,These are bloody fantastic questions! I will c...,,,,
5,2139238999669147_3370347119891656,2023-03-22T04:34:42+0000,@everyone \n7am Sydney Time Thursday \n8pm Iri...,,,,
6,2139238999669147_3370342969892071,2023-03-22T05:39:10+0000,Ree Waugh,,,3370610399865328.0,
7,2139238999669147_3369703796622655,2023-03-21T04:42:48+0000,Set my alarm for 5am today to walk Eddie and b...,,,3369948876598147.0,
8,2139238999669147_3366060270320341,2023-03-15T13:38:20+0000,Thank you for the info Amanda. I'm struggling...,,,3366487800277588.0,
9,2139238999669147_3366060270320341,2023-03-15T07:39:21+0000,Thank you x,,,3366074393652262.0,


### Modify the code to my needs
Given the token limits for ChatGPT, it would be more practical for me to adapt ChatGPT's code to create the output I want at this point.

In [20]:
# I will use this in the final code for the task
import requests
import pandas as pd

def get_fb_comments(group_id, access_token, limit=25):
    """
    Pulls comments from a Facebook group using the Facebook API and outputs the results into a pandas DataFrame.
    
    Parameters:
    group_id (str): The ID of the Facebook group to pull comments from.
    access_token (str): A Facebook access token with the necessary permissions to access the group.
    
    Returns:
    Tuple(pandas.DataFrame, dict): A tuple containing a DataFrame with each comment as a row along with its details and replies,
    and the full JSON response from the API.
    """
    # Set up the API endpoint and parameters
    url = f"https://graph.facebook.com/{group_id}/feed"
    params = {
        "fields": "comments{created_time,message,from,comments{created_time,message,from}}",
        "access_token": access_token,
        "limit": limit
    }

    # Make the GET request
    response = requests.get(url, params=params)
    
    # Convert the response to a DataFrame
    comments = []
    for post in response.json()["data"]:
        post_comments = post.get("comments", {}).get("data", [])
        for comment in post_comments:
            comment_dict = {
                "post_id": post.get("id"),
                "created_time": comment.get("created_time"),
                "message": comment.get("message"),
                "author_id": comment.get("from", {}).get("id"),
                "author_name": comment.get("from", {}).get("name"),
                "parent_id": None,
                "parent_name": None,
            }
            parent_comment = comment.get("comments", {}).get("data", [])
            if parent_comment:
                parent_comment = parent_comment[0]
                comment_dict["parent_id"] = parent_comment.get("id")
                comment_dict["parent_name"] = parent_comment.get("from", {}).get("name")
            comments.append(comment_dict)
    df = pd.DataFrame(comments)
    return df, response.json()


group_id = group_id  # replace with your Facebook group ID
access_token = access_token
comments_df, comments_response = get_fb_comments(group_id, access_token)


In [35]:
comments_response['data']

[{'comments': {'data': [{'created_time': '2023-03-21T20:45:46+0000',
     'message': 'are protein bars bad for you as they are so processed',
     'id': '3370383936554641'},
    {'created_time': '2023-03-21T21:25:54+0000',
     'message': 'Is protein really important ... should do we need to take protein supplements or powder...\nOr',
     'id': '3370399599886408'},
    {'created_time': '2023-03-21T23:31:32+0000',
     'message': 'For those of us entering perimenopause, what should we be conscious of in our diet?',
     'id': '3370448433214858'},
    {'created_time': '2023-03-22T03:42:17+0000',
     'message': 'What snacks should we lean more towards when snacking and what should we limit in snacking moments?',
     'id': '3370548013204900'},
    {'created_time': '2023-03-22T04:33:53+0000',
     'message': 'These are bloody fantastic questions! I will cover all of these tomorrow \U0001fae1',
     'id': '3370568783202823'},
    {'created_time': '2023-03-22T04:34:42+0000',
     'message'

In [38]:
print(comments_df.shape)
comments_df.head(10)

(30, 7)


Unnamed: 0,post_id,created_time,message,author_id,author_name,parent_id,parent_name
0,2139238999669147_3370347119891656,2023-03-21T20:45:46+0000,are protein bars bad for you as they are so pr...,,,,
1,2139238999669147_3370347119891656,2023-03-21T21:25:54+0000,Is protein really important ... should do we n...,,,,
2,2139238999669147_3370347119891656,2023-03-21T23:31:32+0000,"For those of us entering perimenopause, what s...",,,,
3,2139238999669147_3370347119891656,2023-03-22T03:42:17+0000,What snacks should we lean more towards when s...,,,,
4,2139238999669147_3370347119891656,2023-03-22T04:33:53+0000,These are bloody fantastic questions! I will c...,,,,
5,2139238999669147_3370347119891656,2023-03-22T04:34:42+0000,@everyone \n7am Sydney Time Thursday \n8pm Iri...,,,,
6,2139238999669147_3370342969892071,2023-03-22T05:39:10+0000,Ree Waugh,,,3370610399865328.0,
7,2139238999669147_3369703796622655,2023-03-21T04:42:48+0000,Set my alarm for 5am today to walk Eddie and b...,,,3369948876598147.0,
8,2139238999669147_3366060270320341,2023-03-15T13:38:20+0000,Thank you for the info Amanda. I'm struggling...,,,3366487800277588.0,
9,2139238999669147_3366060270320341,2023-03-15T07:39:21+0000,Thank you x,,,3366074393652262.0,


In [23]:
comments_df.columns

Index(['post_id', 'created_time', 'message', 'author_id', 'author_name',
       'parent_id', 'parent_name'],
      dtype='object')

In [32]:
posts_df.columns

Index(['post_id', 'created_time', 'message', 'comments_count', 'like_count',
       'love_count', 'wow_count', 'haha_count', 'sad_count', 'angry_count',
       'permalink_url'],
      dtype='object')

## Prompt 3.3: Merge comments and posts into a single dataframe
I have two dataframes:
1. The comments_df table has the following columns: ['post_id', 'created_time', 'message', 'author_id', 'author_name',
       'parent_id', 'parent_name']
2. The posts_df table has the following columns: ['post_id', 'created_time', 'message', 'comments_count', 'like_count',
       'love_count', 'wow_count', 'haha_count', 'sad_count', 'angry_count',
       'permalink_url']

Use Python to generate code to merge the two dataframes to create a table with the following columns:
post_id, 
post_message, 
post_created_time, 
comment_created_time, 
comment_message, 
parent_id, 
parent_name

The table should include all the comments. This means that rows from posts_df may be duplicated.

### Response
You can merge the two dataframes using the merge() method of pandas. You will need to merge on the post_id column, which is common to both dataframes. To include all comments, even those without a parent comment, use a left join so that all rows from the comments_df table are included. Here's the code:
```python
import pandas as pd

# read in the dataframes
comments_df = pd.read_csv('comments.csv')
posts_df = pd.read_csv('posts.csv')

# merge the dataframes
merged_df = pd.merge(comments_df, posts_df, on='post_id', how='left')

# select the desired columns
merged_df = merged_df[['post_id', 'message_y', 'created_time_y', 'created_time_x', 'message_x', 'parent_id', 'parent_name']]

# rename the columns
merged_df.columns = ['post_id', 'post_message', 'post_created_time', 'comment_created_time', 'comment_message', 'parent_id', 'parent_name']

# save the merged dataframe to a csv file
merged_df.to_csv('merged.csv', index=False)
```
Here, the merge() method is used to join the two dataframes on the post_id column with a left join. The resulting dataframe includes all comments and may contain duplicated rows from the posts_df. Then, the desired columns are selected and the column names are renamed to match the desired output. Finally, the merged dataframe is saved to a csv file called merged.csv with the to_csv() method.

### Test out the code

In [33]:

# merge the dataframes
merged_df = pd.merge(comments_df, posts_df, on='post_id', how='left')

# select the desired columns
merged_df = merged_df[['post_id', 'message_y', 'created_time_y', 'created_time_x', 'message_x', 'parent_id', 'parent_name']]

# rename the columns
merged_df.columns = ['post_id', 'post_message', 'post_created_time', 'comment_created_time', 'comment_message', 'parent_id', 'parent_name']

merged_df


Unnamed: 0,post_id,post_message,post_created_time,comment_created_time,comment_message,parent_id,parent_name
0,2139238999669147_3370347119891656,@everyone for team training this week. A vast ...,2023-03-21T19:11:52+0000,2023-03-21T20:45:46+0000,are protein bars bad for you as they are so pr...,,
1,2139238999669147_3370347119891656,@everyone for team training this week. A vast ...,2023-03-21T19:11:52+0000,2023-03-21T21:25:54+0000,Is protein really important ... should do we n...,,
2,2139238999669147_3370347119891656,@everyone for team training this week. A vast ...,2023-03-21T19:11:52+0000,2023-03-21T23:31:32+0000,"For those of us entering perimenopause, what s...",,
3,2139238999669147_3370347119891656,@everyone for team training this week. A vast ...,2023-03-21T19:11:52+0000,2023-03-22T03:42:17+0000,What snacks should we lean more towards when s...,,
4,2139238999669147_3370347119891656,@everyone for team training this week. A vast ...,2023-03-21T19:11:52+0000,2023-03-22T04:33:53+0000,These are bloody fantastic questions! I will c...,,
5,2139238999669147_3370347119891656,@everyone for team training this week. A vast ...,2023-03-21T19:11:52+0000,2023-03-22T04:34:42+0000,@everyone \n7am Sydney Time Thursday \n8pm Iri...,,
6,2139238999669147_3370342969892071,How shocked would you be if I told you - there...,2023-03-21T19:02:42+0000,2023-03-22T05:39:10+0000,Ree Waugh,3370610399865328.0,
7,2139238999669147_3369703796622655,Why your morning routine may actually be NEGAT...,2023-03-20T18:56:34+0000,2023-03-21T04:42:48+0000,Set my alarm for 5am today to walk Eddie and b...,3369948876598147.0,
8,2139238999669147_3366060270320341,Calories in versus calories out (how do I over...,2023-03-15T07:02:37+0000,2023-03-15T13:38:20+0000,Thank you for the info Amanda. I'm struggling...,3366487800277588.0,
9,2139238999669147_3366060270320341,Calories in versus calories out (how do I over...,2023-03-15T07:02:37+0000,2023-03-15T07:39:21+0000,Thank you x,3366074393652262.0,


## Prompt 3.4: Pulling comment replies
I have a Python json object in the following format:
[{'comments': {'data': [{'created_time': '2023-03-21T20:45:46+0000',
     'message': 'are protein bars bad for you as they are so processed',
     'id': '3370383936554641'},
    {'created_time': '2023-03-21T21:25:54+0000',
     'message': 'Is protein really important ... should do we need to take protein supplements or powder...\nOr',
     'id': '3370399599886408'},
    {'created_time': '2023-03-22T04:34:42+0000',
     'message': '@everyone \n7am Sydney Time Thursday \n8pm Irish time \n\nKeep your eyes peeled!',
     'id': '3370569076536127'}],
   'paging': {'cursors': {'before': 'WTI5dGJXVnVkRjlqZAFhKemIzSTZANek0zTURNNE16a3pOalUxTkRZAME1Ub3hOamM1TkRNeE5UUTMZD',
     'after': 'WTI5dGJXVnVkRjlqZAFhKemIzSTZANek0zTURVMk9UQTNOalV6TmpFeU56b3hOamM1TkRVNU5qZA3oZD'}}},
  'id': '2139238999669147_3370347119891656'},
 {'comments': {'data': [{'created_time': '2023-03-22T05:39:10+0000',
     'message': 'Ree Waugh',
     'comments': {'data': [{'created_time': '2023-03-22T06:20:38+0000',
        'message': 'Zoe Leah it‚Äôs good to see isn‚Äôt it?',
        'id': '3370610399865328'},
       {'created_time': '2023-03-22T08:25:20+0000',
        'message': 'Amanda Mc Loone Definitely!',
        'id': '3370661996526835'}],
      'paging': {'cursors': {'before': 'QVFIUld1S1ZAGTUNFM0VJOXpfSTRiU19BRGZAPWE1qTk5OcnhOeGtzeHhUUTM0Y0VKNmxmSEJ0R0tsUVRLbEJZAZAjR5djJNWmdLNUgyX3NBRjNXYzdiS0h5S1p3',
        'after': 'QVFIUkFhU28wSW5hY1lkbTZATSEo5cjFSMWhBamVxTGVKYkxUcWRrVmxDZAmJqRFE4bWZADbnVPWlNvQUlZAdnhFdUdPdmlkVG84a0V1X0NaQ3QtRkJkMmlRQmpR'}}},
     'id': '3370594463200255'}],
   'paging': {'cursors': {'before': 'WTI5dGJXVnVkRjlqZAFhKemIzSTZANek0zTURVNU5EUTJNekl3TURJMU5Ub3hOamM1TkRZAek5UVXgZD',
     'after': 'WTI5dGJXVnVkRjlqZAFhKemIzSTZANek0zTURVNU5EUTJNekl3TURJMU5Ub3hOamM1TkRZAek5UVXgZD'}}},
  'id': '2139238999669147_3370342969892071'},
...
]

Write a Python function that uses this data to create a dataframe that contains only the replies to the comments.
The dataframe should have these columns: post_id, parent_comment_message, comment_id, reply_comment_message, reply_created_time, reply_id.

### Response
Here's a Python function that takes in the json object and creates a pandas dataframe with the desired columns:
```python
import pandas as pd

def create_reply_dataframe(json_object):
    # Initialize empty lists to store data
    post_id_list = []
    parent_comment_message_list = []
    comment_id_list = []
    reply_comment_message_list = []
    reply_created_time_list = []
    reply_id_list = []
    
    # Loop through each post in the json object
    for post in json_object:
        # Get the post ID
        post_id = post['id']
        
        # Loop through each comment in the post
        for comment in post['comments']['data']:
            # Get the comment ID and message
            comment_id = comment['id']
            parent_comment_message = comment['message']
            
            # Check if the comment has any replies
            if 'comments' in comment:
                # Loop through each reply to the comment
                for reply in comment['comments']['data']:
                    # Get the reply ID, message, and creation time
                    reply_id = reply['id']
                    reply_comment_message = reply['message']
                    reply_created_time = reply['created_time']
                    
                    # Add the data to the lists
                    post_id_list.append(post_id)
                    parent_comment_message_list.append(parent_comment_message)
                    comment_id_list.append(comment_id)
                    reply_comment_message_list.append(reply_comment_message)
                    reply_created_time_list.append(reply_created_time)
                    reply_id_list.append(reply_id)
    
    # Create a dataframe from the lists
    df = pd.DataFrame({
        'post_id': post_id_list,
        'parent_comment_message': parent_comment_message_list,
        'comment_id': comment_id_list,
        'reply_comment_message': reply_comment_message_list,
        'reply_created_time': reply_created_time_list,
        'reply_id': reply_id_list
    })
    
    return df
```
You can use this function like this:
```python
json_object = [{'comments': {'data': [{'created_time': '2023-03-21T20:45:46+0000',
     'message': 'are protein bars bad for you as they are so processed',
     'id': '3370383936554641'},
    {'created_time': '2023-03-21T21:25:54+0000',
     'message': 'Is protein really important ... should do we need to take protein supplements or powder...\nOr',
     'id': '3370399599886408'},
    {'created_time': '2023-03-22T04:34:42+0000',
     'message': '@everyone \n7am Sydney Time Thursday \n8pm Irish time \n\nKeep your eyes peeled!',
     'id': '3370569076536127'}],
   'paging': {'cursors': {'before': 'WTI5dGJXVnVkRjlqZAFhKemIzSTZANek0zTURNNE16a3pOalUxTkRZAME1Ub3hOamM1TkRNeE5UUTMZD',
     'after': 'WTI5dGJXVnVkRjlqZAFhKemIzSTZANek0zTURVMk9UQTNOalV6TmpFeU56b3hOamM1TkRVNU5qZA3oZD'}}},
  'id': '2139238999669147_3370347119891656'},
 {'comments': {'data': [{'created_time': '2023-03-22T05:39:10+0000',
     'message
```

### Test the code

In [None]:
import pandas as pd

def create_reply_dataframe(json_object):
    # Initialize empty lists to store data
    post_id_list = []
    parent_comment_message_list = []
    comment_id_list = []
    reply_comment_message_list = []
    reply_created_time_list = []
    reply_id_list = []
    
    # Loop through each post in the json object
    for post in json_object:
        # Get the post ID
        post_id = post['id']
        
        # Loop through each comment in the post
        for comment in post['comments']['data']:
            # Get the comment ID and message
            comment_id = comment['id']
            parent_comment_message = comment['message']
            
            # Check if the comment has any replies
            if 'comments' in comment:
                # Loop through each reply to the comment
                for reply in comment['comments']['data']:
                    # Get the reply ID, message, and creation time
                    reply_id = reply['id']
                    reply_comment_message = reply['message']
                    reply_created_time = reply['created_time']
                    
                    # Add the data to the lists
                    post_id_list.append(post_id)
                    parent_comment_message_list.append(parent_comment_message)
                    comment_id_list.append(comment_id)
                    reply_comment_message_list.append(reply_comment_message)
                    reply_created_time_list.append(reply_created_time)
                    reply_id_list.append(reply_id)
    
    # Create a dataframe from the lists
    df = pd.DataFrame({
        'post_id': post_id_list,
        'parent_comment_message': parent_comment_message_list,
        'comment_id': comment_id_list,
        'reply_comment_message': reply_comment_message_list,
        'reply_created_time': reply_created_time_list,
        'reply_id': reply_id_list
    })
    
    return df

replies = create_reply_dataframe(comments_response['data'])

TypeError: string indices must be integers

### Modify the code
ChatGPT's code resulted in error messages. Upon examination, it's because not all comments have replies. I added the `try` and `except` statements and that fixed it.

In [None]:
import pandas as pd

def create_reply_dataframe(json_object):
    # Initialize empty lists to store data
    post_id_list = []
    parent_comment_message_list = []
    comment_id_list = []
    reply_comment_message_list = []
    reply_created_time_list = []
    reply_id_list = []
    
    # Loop through each post in the json object
    for post in json_object:
        # Get the post ID
        post_id = post['id']
        
        try: # SH 2023-03-22 10:50 Added this 
            # Loop through each comment in the post
            for comment in post['comments']['data']:
                # Get the comment ID and message
                comment_id = comment['id']
                parent_comment_message = comment['message']
                
                # Check if the comment has any replies
                # if 'comments' in comment:
                if 'comments' in comment:
                    # Loop through each reply to the comment
                    for reply in comment['comments']['data']:
                        # Get the reply ID, message, and creation time
                        reply_id = reply['id']
                        reply_comment_message = reply['message']
                        reply_created_time = reply['created_time']
                        
                        # Add the data to the lists
                        post_id_list.append(post_id)
                        parent_comment_message_list.append(parent_comment_message)
                        comment_id_list.append(comment_id)
                        reply_comment_message_list.append(reply_comment_message)
                        reply_created_time_list.append(reply_created_time)
                        reply_id_list.append(reply_id)
        except:
            pass
    
    # Create a dataframe from the lists
    df = pd.DataFrame({
        'post_id': post_id_list,
        'parent_comment_message': parent_comment_message_list,
        'comment_id': comment_id_list,
        'reply_comment_message': reply_comment_message_list,
        'reply_created_time': reply_created_time_list,
        'reply_id': reply_id_list
    })
    
    return df

replies = create_reply_dataframe(comments_response['data'])

In [26]:
replies.head()

Unnamed: 0,post_id,parent_comment_message,comment_id,reply_comment_message,reply_created_time,reply_id
0,2139238999669147_3370342969892071,Ree Waugh,3370594463200255,Zoe Leah it‚Äôs good to see isn‚Äôt it?,2023-03-22T06:20:38+0000,3370610399865328
1,2139238999669147_3370342969892071,Ree Waugh,3370594463200255,"Amanda Mc Loone yes, very much so ü§ç",2023-03-22T06:21:05+0000,3370610566531978
2,2139238999669147_3370342969892071,Ree Waugh,3370594463200255,Amanda Mc Loone this is so interesting to know...,2023-03-22T06:29:21+0000,3370613349865033
3,2139238999669147_3370342969892071,Ree Waugh,3370594463200255,K Bev Nyirenda scales don‚Äôt always show body c...,2023-03-22T07:34:23+0000,3370641649862203
4,2139238999669147_3370342969892071,Ree Waugh,3370594463200255,Amanda Mc Loone Definitely!,2023-03-22T08:25:20+0000,3370661996526835


## Prompt 3.5 Merge posts, comments, and replies

I have 3 dataframes:
1. The comments_df dataframe has the following columns: ['post_id', 'created_time', 'message', 'author_id', 'author_name',
       'parent_id', 'parent_name']
2. The posts_df dataframe has the following columns: ['post_id', 'created_time', 'message', 'comments_count', 'like_count',
       'love_count', 'wow_count', 'haha_count', 'sad_count', 'angry_count',
       'permalink_url']
3. The replies dataframe has the following columns: ['post_id', 'parent_comment_message', 'comment_id',
       'reply_comment_message', 'reply_created_time', 'reply_id']

Write a python function to merge the dataframes to create a table with the following columns:
post_id, 
post_message, 
post_created_time, 
comment_created_time, 
comment_message, 
parent_id, 
parent_name,
reply_comment_message, 
reply_created_time, 
reply_id

The table should include all the comments and replies. This means that rows from posts_df and comments_df may be duplicated.

### Response
Here is the Python function to merge the dataframes:
```python
import pandas as pd

def merge_dataframes(comments_df, posts_df, replies_df):
    # merge comments and posts dataframes on post_id
    merged_df = pd.merge(comments_df, posts_df, on='post_id', how='outer')
    # merge replies dataframe on comment_id
    merged_df = pd.merge(merged_df, replies_df, left_on='parent_id', right_on='comment_id', how='left')
    # rename columns
    merged_df = merged_df.rename(columns={
        'created_time_x': 'comment_created_time',
        'created_time_y': 'post_created_time',
        'message_x': 'comment_message',
        'message_y': 'post_message',
        'parent_name_x': 'parent_name',
        'reply_comment_message': 'reply_message',
        'reply_created_time': 'reply_created_time',
        'reply_id': 'reply_id'
    })
    # drop unnecessary columns
    merged_df = merged_df.drop(columns=['parent_id', 'parent_name_y', 'comment_id'])
    # reorder columns
    merged_df = merged_df[['post_id', 'post_message', 'post_created_time', 'comment_created_time', 'comment_message',
                           'parent_id', 'parent_name', 'reply_message', 'reply_created_time', 'reply_id']]
    return merged_df
```


### Test the function

In [None]:
merged = merge_dataframes(comments_df, posts_df, replies)
merged.head()

KeyError: "['parent_name_y'] not found in axis"

### Modify the code
#### Iteration 1

In [42]:

def merge_dataframes(comments_df, posts_df, replies_df):
    # merge comments and posts dataframes on post_id
    merged_df = pd.merge(comments_df, posts_df, on='post_id', how='outer')
    # merge replies dataframe on comment_id
    merged_df2 = pd.merge(merged_df, replies_df, left_on='parent_id', right_on='comment_id', how='left')
    # rename columns
    # merged_df2 = merged_df.rename(columns={
    #     'post_id_x': 'post_id',
    #     'created_time_x': 'comment_created_time',
    #     'created_time_y': 'post_created_time',
    #     'message_x': 'comment_message',
    #     'message_y': 'post_message',
    #     'parent_name_x': 'parent_name',
    #     'reply_comment_message': 'reply_message',
    #     'reply_created_time': 'reply_created_time',
    #     'reply_id': 'reply_id'
    # })
    # drop unnecessary columns
    # merged_df2 = merged_df.drop(columns=['parent_id', 'parent_name', 'comment_id'])
    # # reorder columns
    # merged_df = merged_df[['post_id', 'post_message', 'post_created_time', 'comment_created_time', 'parent_comment_message',
    #                        'parent_id', 'parent_name', 'reply_message', 'reply_created_time', 'reply_id']]
    return merged_df, merged_df2

merged, merged2 = merge_dataframes(comments_df, posts_df, replies)
print(merged.columns)
merged.head()

Index(['post_id', 'created_time_x', 'message_x', 'author_id', 'author_name',
       'parent_id', 'parent_name', 'created_time_y', 'message_y',
       'comments_count', 'like_count', 'love_count', 'wow_count', 'haha_count',
       'sad_count', 'angry_count', 'permalink_url'],
      dtype='object')


Unnamed: 0,post_id,created_time_x,message_x,author_id,author_name,parent_id,parent_name,created_time_y,message_y,comments_count,like_count,love_count,wow_count,haha_count,sad_count,angry_count,permalink_url
0,2139238999669147_3370347119891656,2023-03-21T20:45:46+0000,are protein bars bad for you as they are so pr...,,,,,2023-03-21T19:11:52+0000,@everyone for team training this week. A vast ...,7,2,0,0,0,0,0,https://www.facebook.com/groups/strongandsassy...
1,2139238999669147_3370347119891656,2023-03-21T21:25:54+0000,Is protein really important ... should do we n...,,,,,2023-03-21T19:11:52+0000,@everyone for team training this week. A vast ...,7,2,0,0,0,0,0,https://www.facebook.com/groups/strongandsassy...
2,2139238999669147_3370347119891656,2023-03-21T23:31:32+0000,"For those of us entering perimenopause, what s...",,,,,2023-03-21T19:11:52+0000,@everyone for team training this week. A vast ...,7,2,0,0,0,0,0,https://www.facebook.com/groups/strongandsassy...
3,2139238999669147_3370347119891656,2023-03-22T03:42:17+0000,What snacks should we lean more towards when s...,,,,,2023-03-21T19:11:52+0000,@everyone for team training this week. A vast ...,7,2,0,0,0,0,0,https://www.facebook.com/groups/strongandsassy...
4,2139238999669147_3370347119891656,2023-03-22T04:33:53+0000,These are bloody fantastic questions! I will c...,,,,,2023-03-21T19:11:52+0000,@everyone for team training this week. A vast ...,7,2,0,0,0,0,0,https://www.facebook.com/groups/strongandsassy...


In [40]:
print(merged2.columns)
merged2['comment_id']

Index(['post_id_x', 'created_time_x', 'message_x', 'author_id', 'author_name',
       'parent_id', 'parent_name', 'created_time_y', 'message_y',
       'comments_count', 'like_count', 'love_count', 'wow_count', 'haha_count',
       'sad_count', 'angry_count', 'permalink_url', 'post_id_y',
       'parent_comment_message', 'comment_id', 'reply_comment_message',
       'reply_created_time', 'reply_id'],
      dtype='object')


0     NaN
1     NaN
2     NaN
3     NaN
4     NaN
5     NaN
6     NaN
7     NaN
8     NaN
9     NaN
10    NaN
11    NaN
12    NaN
13    NaN
14    NaN
15    NaN
16    NaN
17    NaN
18    NaN
19    NaN
20    NaN
21    NaN
22    NaN
23    NaN
24    NaN
25    NaN
26    NaN
27    NaN
28    NaN
29    NaN
30    NaN
31    NaN
32    NaN
33    NaN
34    NaN
35    NaN
36    NaN
37    NaN
38    NaN
39    NaN
40    NaN
Name: comment_id, dtype: object

### Iteration 2
I looked closely at the resulting output and did debugging. I found that:
* There were some mistakes when renaming and dropping columns.
* The merge did not join the dataframes correctly due to the id numbers not corresponding between the comments and replies.

The modified version below fixes these issues.

In [49]:
# This will make up the final script for the task 
def merge_dataframes(comments_df, posts_df, replies_df):
    # merge comments and posts dataframes on post_id
    merged_df = pd.merge(comments_df, posts_df, on='post_id', how='outer')
    # merge replies dataframe on comment_id
    merged_df = pd.merge(merged_df, replies_df, left_on='message_x', right_on='parent_comment_message', how='left')
    # rename columns
    merged_df = merged_df.rename(columns={
        'post_id_x': 'post_id',
        'created_time_x': 'comment_created_time',
        'created_time_y': 'post_created_time',
        'message_x': 'comment_message',
        'message_y': 'post_message',
        'parent_name_x': 'parent_name',
        'reply_comment_message': 'reply_message',
        'reply_created_time': 'reply_created_time',
        'reply_id': 'reply_id'
    })
    # drop unnecessary columns
    merged_df = merged_df.drop(columns=['parent_name', 'parent_id'])
    # reorder columns
    merged_df = merged_df[['post_id', 'post_message', 'post_created_time', 'comment_created_time', 'parent_comment_message',
                           'comment_id', 'reply_message', 'reply_created_time', 'reply_id']]
    return merged_df

merged = merge_dataframes(comments_df, posts_df, replies)
print(merged.columns)
merged.head(10)

Index(['post_id', 'post_message', 'post_created_time', 'comment_created_time',
       'parent_comment_message', 'comment_id', 'reply_message',
       'reply_created_time', 'reply_id'],
      dtype='object')


Unnamed: 0,post_id,post_message,post_created_time,comment_created_time,parent_comment_message,comment_id,reply_message,reply_created_time,reply_id
0,2139238999669147_3370347119891656,@everyone for team training this week. A vast ...,2023-03-21T19:11:52+0000,2023-03-21T20:45:46+0000,,,,,
1,2139238999669147_3370347119891656,@everyone for team training this week. A vast ...,2023-03-21T19:11:52+0000,2023-03-21T21:25:54+0000,,,,,
2,2139238999669147_3370347119891656,@everyone for team training this week. A vast ...,2023-03-21T19:11:52+0000,2023-03-21T23:31:32+0000,,,,,
3,2139238999669147_3370347119891656,@everyone for team training this week. A vast ...,2023-03-21T19:11:52+0000,2023-03-22T03:42:17+0000,,,,,
4,2139238999669147_3370347119891656,@everyone for team training this week. A vast ...,2023-03-21T19:11:52+0000,2023-03-22T04:33:53+0000,,,,,
5,2139238999669147_3370347119891656,@everyone for team training this week. A vast ...,2023-03-21T19:11:52+0000,2023-03-22T04:34:42+0000,,,,,
6,2139238999669147_3370347119891656,@everyone for team training this week. A vast ...,2023-03-21T19:11:52+0000,2023-03-22T18:53:06+0000,,,,,
7,2139238999669147_3370342969892071,How shocked would you be if I told you - there...,2023-03-21T19:02:42+0000,2023-03-22T05:39:10+0000,Ree Waugh,3370594463200255.0,Zoe Leah it‚Äôs good to see isn‚Äôt it?,2023-03-22T06:20:38+0000,3370610399865328.0
8,2139238999669147_3370342969892071,How shocked would you be if I told you - there...,2023-03-21T19:02:42+0000,2023-03-22T05:39:10+0000,Ree Waugh,3370594463200255.0,"Amanda Mc Loone yes, very much so ü§ç",2023-03-22T06:21:05+0000,3370610566531978.0
9,2139238999669147_3370342969892071,How shocked would you be if I told you - there...,2023-03-21T19:02:42+0000,2023-03-22T05:39:10+0000,Ree Waugh,3370594463200255.0,Amanda Mc Loone this is so interesting to know...,2023-03-22T06:29:21+0000,3370613349865033.0


# Putting it all together

In [50]:
## I will use this as part of the final code for the task
import requests
import pandas as pd

def get_facebook_posts(group_id, access_token, limit=25):
    """
    Makes a GET request to pull posts from a Facebook group using the Facebook API.
    Outputs the results into a DataFrame so that each post is a row along with its time stamp, message,
    permalink, comments count, reactions, and post ID.
    Also outputs the response.json().

    Parameters:
        group_id (str): The ID of the Facebook group to pull posts from.
        access_token (str): An access token with the appropriate permissions to access the group's content.
        limit (int): The number of posts to retrieve (default is 100).

    Returns:
        pandas.DataFrame: A DataFrame containing the posts and their associated data.
    """
    url = f"https://graph.facebook.com/{group_id}/feed"

    params = {
        "access_token": access_token,
        "fields": "created_time,message,permalink_url,comments.limit(10).summary(true),reactions.limit(500).summary(true).type(LIKE).as(like),reactions.limit(500).summary(true).type(LOVE).as(love),reactions.limit(500).summary(true).type(WOW).as(wow),reactions.limit(500).summary(true).type(HAHA).as(haha),reactions.limit(500).summary(true).type(SAD).as(sad),reactions.limit(500).summary(true).type(ANGRY).as(angry)",
        "limit": limit
    }
    response = requests.get(url, params=params)
    status_code = response.status_code
    print("Response Status Code:", status_code)
    data = response.json()
    return data

## I will use this as part of the final code for the task
def process_response(data):
    posts = []
    comments = []
    for post in data["data"]:
        created_time = post.get("created_time")
        message = post.get("message")
        permalink_url = post.get("permalink_url")
        comments_count = post.get("comments", {}).get("summary", {}).get("total_count", 0)
        reactions = post.get("like", {}).get("summary", {}).get("total_count", 0), post.get("love", {}).get("summary", {}).get("total_count", 0), post.get("wow", {}).get("summary", {}).get("total_count", 0), post.get("haha", {}).get("summary", {}).get("total_count", 0), post.get("sad", {}).get("summary", {}).get("total_count", 0), post.get("angry", {}).get("summary", {}).get("total_count", 0)
        post_id = post.get("id")
        posts.append({
            "post_id": post_id,
            "created_time": created_time,
            "message": message,
            "comments_count": comments_count,
            "like_count": reactions[0],
            "love_count": reactions[1],
            "wow_count": reactions[2],
            "haha_count": reactions[3],
            "sad_count": reactions[4],
            "angry_count": reactions[5],
            "permalink_url": permalink_url,
        })

    df = pd.DataFrame(posts)
    # Reorder columns
    df = df[["post_id", "created_time", "message", "comments_count", "like_count", "love_count", "wow_count", "haha_count", "sad_count", "angry_count", "permalink_url"]]
    
    return df

def get_fb_comments(group_id, access_token, limit=25):
    """
    Pulls comments from a Facebook group using the Facebook API and outputs the results into a pandas DataFrame.
    
    Parameters:
    group_id (str): The ID of the Facebook group to pull comments from.
    access_token (str): A Facebook access token with the necessary permissions to access the group.
    
    Returns:
    Tuple(pandas.DataFrame, dict): A tuple containing a DataFrame with each comment as a row along with its details and replies,
    and the full JSON response from the API.
    """
    # Set up the API endpoint and parameters
    url = f"https://graph.facebook.com/{group_id}/feed"
    params = {
        "fields": "comments{created_time,message,from,comments{created_time,message,from}}",
        "access_token": access_token,
        "limit": limit
    }

    # Make the GET request
    response = requests.get(url, params=params)
    
    # Convert the response to a DataFrame
    comments = []
    for post in response.json()["data"]:
        post_comments = post.get("comments", {}).get("data", [])
        for comment in post_comments:
            comment_dict = {
                "post_id": post.get("id"),
                "created_time": comment.get("created_time"),
                "message": comment.get("message"),
                "author_id": comment.get("from", {}).get("id"),
                "author_name": comment.get("from", {}).get("name"),
                "parent_id": None,
                "parent_name": None,
            }
            parent_comment = comment.get("comments", {}).get("data", [])
            if parent_comment:
                parent_comment = parent_comment[0]
                comment_dict["parent_id"] = parent_comment.get("id")
                comment_dict["parent_name"] = parent_comment.get("from", {}).get("name")
            comments.append(comment_dict)
    df = pd.DataFrame(comments)
    return df, response.json()

def create_reply_dataframe(json_object):
    # Initialize empty lists to store data
    post_id_list = []
    parent_comment_message_list = []
    comment_id_list = []
    reply_comment_message_list = []
    reply_created_time_list = []
    reply_id_list = []
    
    # Loop through each post in the json object
    for post in json_object:
        # Get the post ID
        post_id = post['id']
        
        try: # SH 2023-03-22 10:50 Added this 
            # Loop through each comment in the post
            for comment in post['comments']['data']:
                # Get the comment ID and message
                comment_id = comment['id']
                parent_comment_message = comment['message']
                
                # Check if the comment has any replies
                # if 'comments' in comment:
                if 'comments' in comment:
                    # Loop through each reply to the comment
                    for reply in comment['comments']['data']:
                        # Get the reply ID, message, and creation time
                        reply_id = reply['id']
                        reply_comment_message = reply['message']
                        reply_created_time = reply['created_time']
                        
                        # Add the data to the lists
                        post_id_list.append(post_id)
                        parent_comment_message_list.append(parent_comment_message)
                        comment_id_list.append(comment_id)
                        reply_comment_message_list.append(reply_comment_message)
                        reply_created_time_list.append(reply_created_time)
                        reply_id_list.append(reply_id)
        except:
            pass
    
    # Create a dataframe from the lists
    df = pd.DataFrame({
        'post_id': post_id_list,
        'parent_comment_message': parent_comment_message_list,
        'comment_id': comment_id_list,
        'reply_comment_message': reply_comment_message_list,
        'reply_created_time': reply_created_time_list,
        'reply_id': reply_id_list
    })
    return df

def merge_dataframes(comments_df, posts_df, replies_df):
    # merge comments and posts dataframes on post_id
    merged_df = pd.merge(comments_df, posts_df, on='post_id', how='outer')
    # merge replies dataframe on comment_id
    merged_df = pd.merge(merged_df, replies_df, left_on='message_x', right_on='parent_comment_message', how='left')
    # rename columns
    merged_df = merged_df.rename(columns={
        'post_id_x': 'post_id',
        'created_time_x': 'comment_created_time',
        'created_time_y': 'post_created_time',
        'message_x': 'comment_message',
        'message_y': 'post_message',
        'parent_name_x': 'parent_name',
        'reply_comment_message': 'reply_message',
        'reply_created_time': 'reply_created_time',
        'reply_id': 'reply_id'
    })
    # drop unnecessary columns
    merged_df = merged_df.drop(columns=['parent_name', 'parent_id'])
    # reorder columns
    merged_df = merged_df[['post_id', 'post_message', 'post_created_time', 'comment_created_time', 'parent_comment_message',
                           'comment_id', 'reply_message', 'reply_created_time', 'reply_id']]
    return merged_df

In [4]:
# Get Facebook group posts
posts_data = get_facebook_posts(group_id, access_token)
posts_df = process_response(posts_data)
posts_df.head(10)

Unnamed: 0,post_id,created_time,message,comments_count,like_count,love_count,wow_count,haha_count,sad_count,angry_count,permalink_url
0,2139238999669147_3370989849827383,2023-03-22T20:00:34+0000,"Q&A - protein, protein bars, does training aff...",0,4,1,0,0,0,0,https://www.facebook.com/groups/strongandsassy...
1,2139238999669147_3370971416495893,2023-03-22T19:16:23+0000,Almost half of the world's population are not ...,0,3,0,0,0,0,0,https://www.facebook.com/groups/strongandsassy...
2,2139238999669147_3370347119891656,2023-03-21T19:11:52+0000,@everyone for team training this week. A vast ...,7,2,0,0,0,0,0,https://www.facebook.com/groups/strongandsassy...
3,2139238999669147_3370342969892071,2023-03-21T19:02:42+0000,How shocked would you be if I told you - there...,1,3,6,3,0,0,0,https://www.facebook.com/groups/strongandsassy...
4,2139238999669147_3369703796622655,2023-03-20T18:56:34+0000,Why your morning routine may actually be NEGAT...,1,5,0,0,0,0,0,https://www.facebook.com/groups/strongandsassy...
5,2139238999669147_3367071606885874,2023-03-16T18:50:55+0000,If you‚Äôre constantly worried about calories an...,0,4,1,0,0,0,0,https://www.facebook.com/groups/strongandsassy...
6,2139238999669147_3366477613611940,2023-03-15T20:56:22+0000,You‚Äôve heard the saying ‚Äòa dog is for life - n...,0,1,4,0,0,0,0,https://www.facebook.com/groups/strongandsassy...
7,2139238999669147_3366060270320341,2023-03-15T07:02:37+0000,Calories in versus calories out (how do I over...,2,5,3,0,0,0,0,https://www.facebook.com/groups/strongandsassy...
8,2139238999669147_3365771280349240,2023-03-14T19:04:59+0000,"The humble back photo üîô\n\nIn my opinion, usua...",6,15,6,0,0,0,0,https://www.facebook.com/groups/strongandsassy...
9,2139238999669147_3365106907082344,2023-03-13T18:45:59+0000,It would actually break your heart to read my ...,0,3,0,0,0,0,0,https://www.facebook.com/groups/strongandsassy...


In [5]:
comments_df, comments_response = get_fb_comments(group_id, access_token)
comments_df.head(10)


Unnamed: 0,post_id,created_time,message,author_id,author_name,parent_id,parent_name
0,2139238999669147_3370347119891656,2023-03-21T20:45:46+0000,are protein bars bad for you as they are so pr...,,,,
1,2139238999669147_3370347119891656,2023-03-21T21:25:54+0000,Is protein really important ... should do we n...,,,,
2,2139238999669147_3370347119891656,2023-03-21T23:31:32+0000,"For those of us entering perimenopause, what s...",,,,
3,2139238999669147_3370347119891656,2023-03-22T03:42:17+0000,What snacks should we lean more towards when s...,,,,
4,2139238999669147_3370347119891656,2023-03-22T04:33:53+0000,These are bloody fantastic questions! I will c...,,,,
5,2139238999669147_3370347119891656,2023-03-22T04:34:42+0000,@everyone \n7am Sydney Time Thursday \n8pm Iri...,,,,
6,2139238999669147_3370347119891656,2023-03-22T18:53:06+0000,Any snack/food ideas for when your out all day...,,,,
7,2139238999669147_3370342969892071,2023-03-22T05:39:10+0000,Ree Waugh,,,3370610399865328.0,
8,2139238999669147_3369703796622655,2023-03-21T04:42:48+0000,Set my alarm for 5am today to walk Eddie and b...,,,3369948876598147.0,
9,2139238999669147_3366060270320341,2023-03-15T13:38:20+0000,Thank you for the info Amanda. I'm struggling...,,,3366487800277588.0,


In [6]:
replies = create_reply_dataframe(comments_response['data'])
replies.head()

Unnamed: 0,post_id,parent_comment_message,comment_id,reply_comment_message,reply_created_time,reply_id
0,2139238999669147_3370342969892071,Ree Waugh,3370594463200255,Zoe Leah it‚Äôs good to see isn‚Äôt it?,2023-03-22T06:20:38+0000,3370610399865328
1,2139238999669147_3370342969892071,Ree Waugh,3370594463200255,"Amanda Mc Loone yes, very much so ü§ç",2023-03-22T06:21:05+0000,3370610566531978
2,2139238999669147_3370342969892071,Ree Waugh,3370594463200255,Amanda Mc Loone this is so interesting to know...,2023-03-22T06:29:21+0000,3370613349865033
3,2139238999669147_3370342969892071,Ree Waugh,3370594463200255,K Bev Nyirenda scales don‚Äôt always show body c...,2023-03-22T07:34:23+0000,3370641649862203
4,2139238999669147_3370342969892071,Ree Waugh,3370594463200255,Amanda Mc Loone Definitely!,2023-03-22T08:25:20+0000,3370661996526835


In [52]:
merged = merge_dataframes(comments_df, posts_df, replies)
merged

Unnamed: 0,post_id,post_message,post_created_time,comment_created_time,parent_comment_message,comment_id,reply_message,reply_created_time,reply_id
0,2139238999669147_3370347119891656,@everyone for team training this week. A vast ...,2023-03-21T19:11:52+0000,2023-03-21T20:45:46+0000,,,,,
1,2139238999669147_3370347119891656,@everyone for team training this week. A vast ...,2023-03-21T19:11:52+0000,2023-03-21T21:25:54+0000,,,,,
2,2139238999669147_3370347119891656,@everyone for team training this week. A vast ...,2023-03-21T19:11:52+0000,2023-03-21T23:31:32+0000,,,,,
3,2139238999669147_3370347119891656,@everyone for team training this week. A vast ...,2023-03-21T19:11:52+0000,2023-03-22T03:42:17+0000,,,,,
4,2139238999669147_3370347119891656,@everyone for team training this week. A vast ...,2023-03-21T19:11:52+0000,2023-03-22T04:33:53+0000,,,,,
5,2139238999669147_3370347119891656,@everyone for team training this week. A vast ...,2023-03-21T19:11:52+0000,2023-03-22T04:34:42+0000,,,,,
6,2139238999669147_3370347119891656,@everyone for team training this week. A vast ...,2023-03-21T19:11:52+0000,2023-03-22T18:53:06+0000,,,,,
7,2139238999669147_3370342969892071,How shocked would you be if I told you - there...,2023-03-21T19:02:42+0000,2023-03-22T05:39:10+0000,Ree Waugh,3370594463200255.0,Zoe Leah it‚Äôs good to see isn‚Äôt it?,2023-03-22T06:20:38+0000,3370610399865328.0
8,2139238999669147_3370342969892071,How shocked would you be if I told you - there...,2023-03-21T19:02:42+0000,2023-03-22T05:39:10+0000,Ree Waugh,3370594463200255.0,"Amanda Mc Loone yes, very much so ü§ç",2023-03-22T06:21:05+0000,3370610566531978.0
9,2139238999669147_3370342969892071,How shocked would you be if I told you - there...,2023-03-21T19:02:42+0000,2023-03-22T05:39:10+0000,Ree Waugh,3370594463200255.0,Amanda Mc Loone this is so interesting to know...,2023-03-22T06:29:21+0000,3370613349865033.0


### The resulting dataframe contains:
- Posts from a given Facebook group. Function allows user to specify how many posts to retrieve.
- All the comments from the retrieved posts.
- All the replies to those comments.

The intention was put all the posts, comments, and replies into a single table. Because of this, there are null values and duplicate values because:
- Not all posts have comments and not all comments have replies. 
- Some posts have multiple comments and some comments have multiple replies; thus, some data is repeated.

Depending on the task, the above dataframe can be wrangled further.


# *End of Page*