<a href="https://colab.research.google.com/github/shendricks73/Google/blob/main/Capitals.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Task
Display the capital of each state.

## Data collection

### Subtask:
Find a reliable source for US state capitals data.


## Data loading

### Subtask:
Load the state capitals data into a pandas DataFrame.


**Reasoning**:
Import the pandas library and load the state capitals data from the previously identified source into a pandas DataFrame.



In [4]:
import pandas as pd
# Assuming the data source identified in the previous step was a CSV file named 'state_capitals.csv'
# If the source is different, this code will need to be adjusted accordingly.
try:
  df_capitals = pd.read_csv('state_capitals.csv')
except FileNotFoundError:
  # If the file is not found, we will try to read from a common online source like Wikipedia
  print("Local file 'state_capitals.csv' not found. Attempting to read from Wikipedia.")
  try:
    url = 'https://en.wikipedia.org/wiki/List_of_capitals_in_the_United_States'
    tables = pd.read_html(url)
    # The table with state capitals is usually the first one, but we can inspect to be sure
    df_capitals = tables[0]
    print("Successfully read data from Wikipedia.")
  except Exception as e:
    print(f"Could not read data from Wikipedia. Error: {e}")
    df_capitals = None

if df_capitals is not None:
    display(df_capitals.head())

Local file 'state_capitals.csv' not found. Attempting to read from Wikipedia.
Could not read data from Wikipedia. Error: HTTP Error 403: Forbidden


**Reasoning**:
Since reading from a local file and Wikipedia failed, I will try another reliable online source for state capitals data. I will use a different Wikipedia URL that is known to work with `read_html`.



In [5]:
try:
    url = 'https://simple.wikipedia.org/wiki/List_of_U.S._states_by_capital_city'
    tables = pd.read_html(url)
    # The table with state capitals is usually the first one
    df_capitals = tables[0]
    print("Successfully read data from simple Wikipedia.")
except Exception as e:
    print(f"Could not read data from simple Wikipedia. Error: {e}")
    df_capitals = None

if df_capitals is not None:
    display(df_capitals.head())

Could not read data from simple Wikipedia. Error: HTTP Error 403: Forbidden


**Reasoning**:
Since reading from Wikipedia is consistently failing with a 403 Forbidden error, I will try to find a different online source for the data, possibly a government data portal or a different type of file like a JSON or an Excel file if available online. I will try a different Wikipedia page that might be more stable or structured differently for `read_html`.



In [6]:
try:
    url = 'https://en.wikipedia.org/wiki/List_of_state_and_territory_capitals_in_the_United_States'
    tables = pd.read_html(url)
    # The table with state capitals is usually one of the first ones.
    # We need to inspect the tables to find the correct one.
    df_capitals = None
    for i, table in enumerate(tables):
        # Look for a table with columns that likely contain state and capital names
        if any(col in table.columns for col in ['State', 'Capital', 'State or territory', 'Capital city']):
            df_capitals = table
            print(f"Found potential table at index {i}.")
            break

    if df_capitals is not None:
        print("Successfully read data from Wikipedia.")
    else:
        print("Could not find a suitable table on the Wikipedia page.")

except Exception as e:
    print(f"Could not read data from Wikipedia. Error: {e}")
    df_capitals = None

if df_capitals is not None:
    display(df_capitals.head())

Could not read data from Wikipedia. Error: HTTP Error 403: Forbidden


## Data loading

### Subtask:
Load the state capitals data into a pandas DataFrame.


**Reasoning**:
Previous attempts to load data from Wikipedia failed due to access restrictions. I will try to find a publicly available dataset on GitHub or a similar platform that is less likely to block automated access.



In [7]:
import pandas as pd

# Attempt to load data from a known public data repository (e.g., GitHub)
try:
    url = 'https://raw.githubusercontent.com/jasonong/List-of-US-States/master/states.csv'
    df_capitals = pd.read_csv(url)
    print("Successfully read data from GitHub.")
except Exception as e:
    print(f"Could not read data from GitHub. Error: {e}")
    df_capitals = None

if df_capitals is not None:
    display(df_capitals.head())

Successfully read data from GitHub.


Unnamed: 0,State,Abbreviation
0,Alabama,AL
1,Alaska,AK
2,Arizona,AZ
3,Arkansas,AR
4,California,CA


**Reasoning**:
The previous command successfully loaded a DataFrame containing state abbreviations, but it does not include the capital cities. I need to find a data source that includes both state names and their capitals. I will try another public data repository that specifically lists state capitals.



In [8]:
# Attempt to load data from another public data repository with state capitals
try:
    url = 'https://raw.githubusercontent.com/grammakov/USA-cities-and-states/master/us_states_and_capitals.csv'
    df_capitals = pd.read_csv(url)
    print("Successfully read data from GitHub.")
except Exception as e:
    print(f"Could not read data from GitHub. Error: {e}")
    df_capitals = None

if df_capitals is not None:
    display(df_capitals.head())

Could not read data from GitHub. Error: HTTP Error 404: Not Found


**Reasoning**:
The previous attempt to load data from a GitHub repository failed because the file was not found (HTTP Error 404). I will try one more time to find a suitable dataset on a public repository. If this attempt fails, I will conclude that a reliable online data source could not be accessed programmatically for this subtask.



In [9]:
# Attempt to load data from another public data repository with state capitals
try:
    url = 'https://raw.githubusercontent.com/smorestea/us-info/master/us_states_capitals.csv'
    df_capitals = pd.read_csv(url)
    print("Successfully read data from GitHub.")
except Exception as e:
    print(f"Could not read data from GitHub. Error: {e}")
    df_capitals = None

if df_capitals is not None:
    display(df_capitals.head())

Could not read data from GitHub. Error: HTTP Error 404: Not Found


## Data loading

### Subtask:
Load the state capitals data into a pandas DataFrame.


**Reasoning**:
Create a pandas DataFrame manually with state capitals since online sources failed.



In [10]:
import pandas as pd

# Create a dictionary of US states and their capitals
state_capitals_dict = {
    'Alabama': 'Montgomery', 'Alaska': 'Juneau', 'Arizona': 'Phoenix',
    'Arkansas': 'Little Rock', 'California': 'Sacramento', 'Colorado': 'Denver',
    'Connecticut': 'Hartford', 'Delaware': 'Dover', 'Florida': 'Tallahassee',
    'Georgia': 'Atlanta', 'Hawaii': 'Honolulu', 'Idaho': 'Boise',
    'Illinois': 'Springfield', 'Indiana': 'Indianapolis', 'Iowa': 'Des Moines',
    'Kansas': 'Topeka', 'Kentucky': 'Frankfort', 'Louisiana': 'Baton Rouge',
    'Maine': 'Augusta', 'Maryland': 'Annapolis', 'Massachusetts': 'Boston',
    'Michigan': 'Lansing', 'Minnesota': 'Saint Paul', 'Mississippi': 'Jackson',
    'Missouri': 'Jefferson City', 'Montana': 'Helena', 'Nebraska': 'Lincoln',
    'Nevada': 'Carson City', 'New Hampshire': 'Concord', 'New Jersey': 'Trenton',
    'New Mexico': 'Santa Fe', 'New York': 'Albany', 'North Carolina': 'Raleigh',
    'North Dakota': 'Bismarck', 'Ohio': 'Columbus', 'Oklahoma': 'Oklahoma City',
    'Oregon': 'Salem', 'Pennsylvania': 'Harrisburg', 'Rhode Island': 'Providence',
    'South Carolina': 'Columbia', 'South Dakota': 'Pierre', 'Tennessee': 'Nashville',
    'Texas': 'Austin', 'Utah': 'Salt Lake City', 'Vermont': 'Montpelier',
    'Virginia': 'Richmond', 'Washington': 'Olympia', 'West Virginia': 'Charleston',
    'Wisconsin': 'Madison', 'Wyoming': 'Cheyenne'
}

# Convert the dictionary to a pandas DataFrame
df_capitals = pd.DataFrame(list(state_capitals_dict.items()), columns=['State', 'Capital'])

# Display the head of the DataFrame
display(df_capitals.head())

Unnamed: 0,State,Capital
0,Alabama,Montgomery
1,Alaska,Juneau
2,Arizona,Phoenix
3,Arkansas,Little Rock
4,California,Sacramento


## Data display

### Subtask:
Display the state capitals data.


**Reasoning**:
Display the entire `df_capitals` DataFrame as instructed.



In [11]:
display(df_capitals)

Unnamed: 0,State,Capital
0,Alabama,Montgomery
1,Alaska,Juneau
2,Arizona,Phoenix
3,Arkansas,Little Rock
4,California,Sacramento
5,Colorado,Denver
6,Connecticut,Hartford
7,Delaware,Dover
8,Florida,Tallahassee
9,Georgia,Atlanta


## Summary:

### Data Analysis Key Findings

*   Initial attempts to load the state capitals data from online sources, including Wikipedia and public GitHub repositories, were unsuccessful due to access restrictions (HTTP Error 403) or file not found errors (HTTP Error 404).
*   To overcome the data loading issue, a dictionary containing US states and their capitals was manually created.
*   The manually created dictionary was successfully converted into a pandas DataFrame named `df_capitals`, with 'State' and 'Capital' as column headers.
*   The final DataFrame containing all US states and their capitals was successfully displayed.

### Insights or Next Steps

*   For future tasks requiring similar data, consider using a pre-existing, accessible dataset or embedding the data directly within the script if it's static and manageable in size, rather than relying on external web scraping or potentially volatile URLs.
