<a href="https://colab.research.google.com/github/selgebali/Colabs/blob/main/DataCite_2024_public_data_file_blog_post_(Jan_2025).ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# ReadMe File

## Resource Type Distribution Chart

### Overview

This script connects to a Google Sheet, retrieves data, and creates an interactive doughnut chart using Plotly. The chart visualizes the distribution of resource types and their respective counts.

### Prerequisites
	•	Google Account: Required to authenticate and access Google Sheets.
	•	Google Sheet Permissions: Ensure the Google Sheet is shared with the client email provided in the creds object.
	•	Python Libraries: Install the following libraries:
	•	pandas: For data manipulation.
	•	plotly: For creating the chart.
	•	gspread: For accessing Google Sheets.
	•	google-auth: For authentication.


## Steps to Use

```
1.  In the google sheet, format the column with numbers to Format--> Numbers--> Automatic
2.  Replace sheet_url with the URL of your Google Sheet.
3.  Check sharing permissions to ensure the Google Sheet is shared with the client email in your credentials.
4.  Run the script in a Python environment Google Colab.
```
## Key Features
	•	Google Sheets API Integration: Secure access to Google Sheets.
	•	Interactive Doughnut Chart: Displays resource types, percentages, and counts.
	•	Dynamic Colors: Custom palette for visual distinction.

In [None]:
import pandas as pd
import plotly.graph_objects as go

# Authenticate and connect to Google Sheets
from google.colab import auth  # For Google Colab-specific authentication
auth.authenticate_user()  # Authenticates the user in Google Colab

import gspread  # Library for interacting with Google Sheets
from google.auth import default  # Default authentication method for Google APIs
creds, _ = default()  # Retrieve the credentials
gc = gspread.authorize(creds)  # Authorize gspread with the credentials

# Load the data from the Google Sheet
sheet_url = "https://docs.google.com/spreadsheets/d/13q34AW6DBNaSE5wCOb-Cp1aJ6Q3UEevv13ROEpIFZKo/edit?gid=175982832#gid=175982832"

# Open the Google Sheet by its URL
sheet = gc.open_by_url(sheet_url)  # Open the Google Sheet using its URL
worksheet = sheet.get_worksheet(0)  # Get the first worksheet (tab) in the sheet

# Read data into a pandas DataFrame
df = pd.DataFrame(worksheet.get_all_values()[1:], columns=worksheet.get_all_values()[0])
# Explanation:
# - `get_all_values()` retrieves all rows from the worksheet as a list of lists.
# - `[1:]` skips the header row (first row).
# - `columns=worksheet.get_all_values()[0]` uses the first row as column headers.

# Ensure no missing values, and convert the 'Count' column (Column B: index 1) to numeric
df.iloc[:, 1] = pd.to_numeric(df.iloc[:, 1], errors='coerce').fillna(0)
# Explanation:
# - Converts the second column (index 1) to numeric, setting invalid entries to `NaN`.
# - Replaces `NaN` with `0`.

# Sort the data by count (Column B: index 1) in descending order
df = df.sort_values(df.columns[1], ascending=False).reset_index(drop=True)
# Explanation:
# - Sorts the data based on the second column in descending order.
# - Resets the index after sorting.

# Define a muted color palette for the top 10, and grey for the rest
colors = ['#2a4d69', '#00B1E2','5b88b9', '46BCAB', '90d7cd', '#BC2B66', 'eeee98', 'F07C73'] + ['#cccccc'] * (len(df) - 10)
# Explanation:
# - Top 10 categories are assigned specific colors.
# - Remaining categories are colored grey.

# Create hover text for all categories, even the small ones
hover_text = [f'{df.iloc[i, 0]}: {df.iloc[i, 1]:,.0f}' for i in range(len(df))]
# Explanation:
# - Generates hover text for each category in the format: "Category: Count".

# Plotly figure for an interactive doughnut chart
fig = go.Figure(go.Pie(
    labels=df.iloc[:, 0],  # Column A for resource types
    values=df.iloc[:, 1],  # Column B for counts
    hoverinfo='label+percent+value',  # Display label, percentage, and actual value on hover
    textinfo='value+label',  # Display value and label on the slices
    hole=0.4,  # Doughnut hole size
    textposition='inside',  # Position text inside the slices
    insidetextorientation='radial',  # Radial text orientation for better readability
    insidetextfont=dict(size=14, color='white'),  # Font size and color for text inside slices
    marker=dict(colors=colors, line=dict(color='#FFFFFF', width=2))  # Slice colors and white borders
))

# Update layout to add legend and ensure the chart is a circle
fig.update_layout(
    title_text="Resource Type General Distribution",  # Chart title
    annotations=[dict(text='Resources', x=0.5, y=0.5, font_size=20, showarrow=False)],  # Annotation inside the doughnut
    showlegend=True,  # Enable the legend to show all categories
    legend=dict(yanchor="top", y=1, xanchor="left", x=1.5),  # Place legend outside the chart
    margin=dict(l=50, r=50, t=50, b=50),  # Adjust margins for better visualization
    height=1200,  # Chart height
    width=1500,  # Chart width
    plot_bgcolor='white',  # Set plot background to white
    paper_bgcolor='white'  # Set entire figure background to white
)

# Display the figure
fig.show()
