# **High School News Websites: Where Are They and Where Are They Not**

## **Visualizing the Penetration of Student Newspapers Online**

### Overview 

The questions this project aimed to address are:
1. Where in the United States are high schools with student journalism programs located? and
2. What are the differences between high schools with and without student journalism programs?

These questions are fundamental to my work as Knight Chair in Scholastic Journalism. My position was created to support high school journalism programs across the United States. To fulfill this mission effectively, I must first understand and be able to convey to fellow advocates where student journalism programs are located and what differentiates schools that offer journalism experiences versus those that do not. I and others can then use this baseline knowledge to tailor journalism education initiatives.

I hope that a future iteration of this work will live on the [Center for Scholastic Journalism](https://www.kent.edu/mdj/csj) website and serve as a useful tool for understanding the scope of student journalism in the United States, and the discrepancies that exist in this space.

### Data Sources 

#### Source 1: National Center for Education Statistics

The [National Center for Education Statistics](https://nces.ed.gov/) (NCES), a branch of the U.S. Department of Education, is the official source of information about U.S. schools. I used its "Public School Characteristics 2021-22" dataset as the basis for the project. Despite the dataset capturing information from two years ago, it is the most recent version of this data. 

The original file contained 101,278 records of K-12 public schools, including each school's latitude and longitude, school profile (e.g., regular, alternative, virtual), locale (e.g., rural, urban), grade composition, and student population characteristics (i.e., total per grade, race/ethnicity, number eligible for free or reduced-price lunch). 

    

#### Source 2: Student Newspapers Online (SNO)

[Student Newspapers Online](https://snosites.com/) (SNO) is the primary host of student news websites in the United States. The company lists [its clients on its website](https://snosites.com/our-customers/). While this is not a comprehensive list of high school news products, it is the most comprehesive list available in one place. See the Limitations section for a discussion of the shortcomings of using this data in this project.

The original file copied-and-pasted from the SNO website contained about 3,500 website URLs, client (school) names, and the cities and states where they are located. 

### Data Wrangling and Analysis

#### Data Cleaning

I used Excel to edit down both datasets. In the NCES dataset, I eliminated schools that did not include any high school grades (9, 10, 11, 12), and  schools that are not likely to host student media (i.e., alternative schools, virtual schools). I also eliminated redundant columns and columns that would not be used in the visualization. The resulting dataset contained 22,398 high schools and 43 columns of data.

In the SNO dataset, I eliminated all cases that represented college student media, non-U.S. schools, and organizational websites, resulting in a dataset of 2,574 school websites. Each case included six columns: case id, the news website name, URL, school name, city and state.

#### Merging Data

Merging the NCES and SNO datasets constituted a key challenge. The SNO dataset did not contain any numerical identifiers for schools, which are used in the NCES dataset. In many cases the SNO school names did not match exactly the school names in the NCES dataset (e.g., West High School (in Anchorage) vs. Anchorage West High). The `pandas.merge` method only matched 27% of the SNO schools that were labeled identically to how they were labeled in the NCES schools. 

I wrote a script that used a combination of the `fuzzywuzzy` package and human input to facilitate the merging of school names that approximated each other. With this approach, 66% of the schools in the SNO dataset were matched to the NCES dataset. I checked the remaining 873 cases by hand. The merge script missed about 100 schools. The others did not match the NCES dataset because they were private or middle schools, or were not schools. 

The merge resulted in 1,789 schools out of 22,398 (8%) having a SNO website associated with them in the final dataset.

#### Visualizations

I used plotly maps (`plotly.graph_objects.Scattergeo`) to visualize the locations of public high schools and high schools with SNO websites across the United States. I used accompanying plotly bar graphs (`plotly.graph_objects.Bar`) to visualize the proportions of schools with and without SNO websites.

On the maps, the locations of schools without SNO websites are marked with green circle outlines set at .25 opacity. The low opacity is meant to help distinguish individual schools in areas of high population density where school locations overlap significantly. Schools with SNO websites are marked with contrasting filled dark magenta cirlces set at .75 opacity.

Hover text is customized, displaying school names, school cities and states when hovering over schools without SNO websites. Hovering over schools with SNO sites displays the same information, plus the news publication name and URL. To increase legibility, states are presented using AP Style abbreviations. URLs are displayed without the "http://" leader and without the final slash.

To increase coherence across each visualization, the bar graphs use the same color map as the school locations. Hover text is disabled on the bar graph. The percent value that each bar represents is displayed instead, with the bar graph's title and x-axis labels complementing the information necessary to interpret each graph. 

#### Analysis and Insights

Following the "overview first, zoom and filter, details on demand" approach, I first present the map of all public U.S. high schools and all the SNO websites at these schools. The accompanying bar graph shows that 8% of the schools have a SNO site and 92% do not.

I then visualize some of the known discrepancies in the availability of journalism opportunities in high schools. The first of these relates to a school's location. The NCES dataset contains a 12-level location-based categorization of schools, which I collapse into three categories of rural, urban, and suburban. Three maps show rural, urban, and suburban schools, and among these identify the schools with SNO websites. The accompanying bar graphs show how the proportions of schools with and without SNO sites differ based on schools' locations.

I also show discrepancies related to student populations, specificially the majority race/ethnicity of a school's students and the socioeconomics of a school's students. I use data contained in the NCES dataset to categorize schools as being either majority White or majority non-White, and as majority poor or majority affluent. Student race/ethnicity is provided in the NCES dataset for each school. Poverty/affluence is based on the number of students in each school who qualify for free or reduced-price lunch. Some states do not report this information so schools from those states are not included in the final visualizations and bar graphs.

In all, the visualizations and bar graphs reflect and underscore established discrepancies in the types of schools that are more and less likely to support student journalism and media programs. 

### Limitations and Future Work

While Student Newspapers Online provides a convenient source of data about student journalism products, it does not provide a comprehensive listing of student media. Student jouranlism programs can use other platforms to publish their work (e.g., Wix, YouTube). A systematic search for these products was not possible within the scope of this class. I hope to work on complementing the list of SNO websites with the results of future searches for school media products in Google, YouTube, and other social media platforms. 

I limited the schools in this project to public schools. Some private schools also support student journalism, as illustrated by the few hundred private schools in the original SNO dataset. Expanding the visualizations to encompass private schools may be an easy first step in supplementing the data presented in this project.  

The visualizations in the presentation are repetitive. There are ways to present the same information this project contains using fewer maps and bar graphs. With more time, I will be able to consolidate these visualizations into fewer frames and use interactive features like buttons or sliders to allow users to navigate through all the information without the need for scrolling.

The visualizations use a geographic 'usa' scope that comes standard with plotly. It's fine for displaying the entire map of the country but it does not provide any detail like state or city lables that would be useful on zoom. Finding and working with more comprehensive maps can also be a task for future interations of this project. 

### Code

The next five cells contain the code for the visualizations. To see the visualizations with accompanying text descriptions of each, please run these cells and then scroll below them.

The cells contain some elements that are redundant, that is, some code could be run in one cell and then called from subsequent cells. The redundancies ensure that each cell can function on its own. 

In [1]:
import pandas as pd
import plotly.graph_objects as go
from plotly.subplots import make_subplots

In [None]:
# This cell contains code for the map and bar graph of all public schools and their SNO sites.

# Set up a two-column space for the map and the bar graph
# (A five-row grid creates a map and a bar graph that are roughly the same height)

fig1 = make_subplots(rows = 5, 
                     cols = 2, 
                     column_widths = [0.7, 0.3],
                     subplot_titles = ("All High Schools", "Percent of High Schools"),
                     specs=[
                         [{'type': 'scattergeo', 'rowspan': 5}, {'type': 'bar', 'rowspan': 4}],
                         [None, None],
                         [None, None],
                         [None, None],
                         [None, None]
                     ])

fig1.update_layout(
    title = 'Public High Schools in the United States, with and without Student Online Newspaper (SNO) Websites',
    legend = dict(
        orientation = 'h',
        yanchor = 'top',
        y = 0.95,
        xanchor= 'center',
        x = 0.30),
    height = 700)

# Set up the dataframe

alldata = pd.read_csv('thirdmerge_alldata.csv')

# Set up color, opacity, symbol, size maps for map markers

color_map = {1 : 'darkmagenta', 0 : 'darkgreen'}
school_color = alldata['Present'].map(color_map)

line_color_map = {1 : 'darkmagenta', 0 : 'darkgreen'}
school_line_color = alldata['Present'].map(line_color_map)

opacity_map = {1 : 0.75, 0 : 0.25}
school_opacity = alldata['Present'].map(opacity_map)

symbol_map = {1 : 'circle', 0 : 'circle-open'}
school_symbol = alldata['Present'].map(symbol_map)

size_map = {1 : 5, 0 : 4}
school_size = alldata['Present'].map(size_map)

# Set up to state display on hover using AP Style state abbreviations

ap_state_abbreviations = {"AL": "Ala.", "AK": "Alaska", "AZ": "Ariz.", "AR": "Ark.", "CA": "Calif.", "CO": "Colo.", "CT": "Conn.", "DE": "Del.", "FL": "Fla.",\
    "GA": "Ga.", "HI": "Hawaii", "ID": "Idaho", "IL": "Ill.", "IN": "Ind.", "IA": "Iowa", "KS": "Kan.", "KY": "Ky.", "LA": "La.", "ME": "Maine",\
    "MD": "Md.", "MA": "Mass.", "MI": "Mich.", "MN": "Minn.", "MS": "Miss.", "MO": "Mo.", "MT": "Mont.", "NE": "Neb.", "NV": "Nev.", "NH": "N.H.",\
    "NJ": "N.J.", "NM": "N.M.", "NY": "N.Y.", "NC": "N.C.", "ND": "N.D.", "OH": "Ohio", "OK": "Okla.", "OR": "Ore.", "PA": "Pa.", "RI": "R.I.",\
    "SC": "S.C.", "SD": "S.D.", "TN": "Tenn.", "TX": "Texas", "UT": "Utah", "VT": "Vt.", "VA": "Va.", "WA": "Wash.", "WV": "W.Va.", "WI": "Wis.",\
    "WY": "Wyo."}

alldata['State_AP'] = alldata['State'].map(ap_state_abbreviations)

# Set up URL display on hover

alldata['URL'] = alldata['URL'].str.replace(r'^https://|/$', '', regex=True)
alldata['URL'] = alldata['URL'].str.replace(r'^http://', '', regex=True)

# Set up map hover information

def custom_hover_text(row):
    if row['Present'] == 1:
        return f"<br>{str.title(row['School'])}</br>{str.title(row['City'])}, {row['State_AP']}</br>{row['Media']}<br>{row['URL']}"
    else:
        return f"<br>{str.title(row['School'])}</br>{str.title(row['City'])}, {row['State_AP']}"

# Styling the map

fig1.add_trace(
    go.Scattergeo(lon = alldata['X'],
                  lat = alldata['Y'],
                  mode = 'markers',
                  hoverinfo = 'text',
                  hovertext = alldata.apply(custom_hover_text, axis=1),
                  marker = dict(color = school_color,
                                line_color = school_line_color,
                                opacity = school_opacity,
                                symbol = school_symbol,
                                size = school_size),
                  textfont_family = 'Arial',
                  showlegend = False
                 ),
    row = 1, col = 1)  

fig1.update_geos(
    scope = 'usa')

# Calculate % of schools with and without SNO sites

have_media_percent = round(((alldata['Present'] == 1).mean() * 100),1)
have_not_media_percent = round(((alldata['Present'] == 0).mean() * 100),1)

# Styling the bar graph

fig1.add_trace(go.Bar(
    x = ['With a SNO Site'],
    y = [have_media_percent],
    name = 'With a SNO Site',
    marker_color = 'darkmagenta',
    text = have_media_percent,
    hoverinfo = 'none'
), row = 1, col = 2)

fig1.add_trace(go.Bar(
    x = ['Without a SNO Site'],
    y = [have_not_media_percent],
    name = 'Without a SNO Site',
    marker_color = 'darkgreen',
    text = have_not_media_percent,
    hoverinfo = 'none'
), row = 1, col = 2)

# Show both visualizations

# fig1.show()

In [None]:
# This cell contains code for the maps and bar graphs of rural, urban, and suburban public schools and their SNO sites.

# Set up a two-column space for the maps and the bar graphs

fig2 = make_subplots(rows = 15, 
                     cols = 2, 
                     column_widths = [0.7, 0.3],
                     subplot_titles = ("Rural High Schools",
                                       "Percent of Rural High Schools",
                                       'Urban High Schools',
                                       'Percent of Urban High Schools',
                                       'Suburban High Schools',
                                       'Percent of Suburban High Schools'),
                     specs=[
                         [{'type': 'scattergeo', 'rowspan': 5}, {'type': 'bar', 'rowspan': 4}],
                         [None, None],
                         [None, None],
                         [None, None],
                         [None, None],
                         [{'type': 'scattergeo', 'rowspan': 5}, {'type': 'bar', 'rowspan': 4}],
                         [None, None],
                         [None, None],
                         [None, None],
                         [None, None],
                         [{'type': 'scattergeo', 'rowspan': 5}, {'type': 'bar', 'rowspan': 4}],
                         [None, None],
                         [None, None],
                         [None, None],
                         [None, None]
                     ])

fig2.update_layout(
    title = 'Location Differences Among Public High Schools with and without Student Online Newspaper (SNO) Websites',
    legend = dict(
        orientation = 'h',
        yanchor = 'top',
        y = 0.98,
        xanchor= 'center',
        x = 0.30),
    legend2 = dict(
        orientation = 'h',
        yanchor = 'top',
        y = 0.64,
        xanchor= 'center',
        x = 0.30),
    legend3 = dict(
        orientation = 'h',
        yanchor = 'top',
        y = 0.29,
        xanchor= 'center',
        x = 0.30),
    height = 2100)

fig2.update_geos(
    scope = 'usa')

# Set up the base dataframe

alldata = pd.read_csv('thirdmerge_alldata.csv')

## Make adjustments to the base dataframe for the purpose of hover text in how state and URL are displayed

ap_state_abbreviations = {"AL": "Ala.", "AK": "Alaska", "AZ": "Ariz.", "AR": "Ark.", "CA": "Calif.", "CO": "Colo.", "CT": "Conn.", "DE": "Del.", "FL": "Fla.",\
    "GA": "Ga.", "HI": "Hawaii", "ID": "Idaho", "IL": "Ill.", "IN": "Ind.", "IA": "Iowa", "KS": "Kan.", "KY": "Ky.", "LA": "La.", "ME": "Maine",\
    "MD": "Md.", "MA": "Mass.", "MI": "Mich.", "MN": "Minn.", "MS": "Miss.", "MO": "Mo.", "MT": "Mont.", "NE": "Neb.", "NV": "Nev.", "NH": "N.H.",\
    "NJ": "N.J.", "NM": "N.M.", "NY": "N.Y.", "NC": "N.C.", "ND": "N.D.", "OH": "Ohio", "OK": "Okla.", "OR": "Ore.", "PA": "Pa.", "RI": "R.I.",\
    "SC": "S.C.", "SD": "S.D.", "TN": "Tenn.", "TX": "Texas", "UT": "Utah", "VT": "Vt.", "VA": "Va.", "WA": "Wash.", "WV": "W.Va.", "WI": "Wis.",\
    "WY": "Wyo."}

alldata['State_AP'] = alldata['State'].map(ap_state_abbreviations)

alldata['URL'] = alldata['URL'].str.replace(r'^https://|/$', '', regex=True)
alldata['URL'] = alldata['URL'].str.replace(r'^http://', '', regex=True)

# Set up the three 'child' dataframes

locales_combined = {"11-City: Large": "Urban", "12-City: Mid-size": "Suburban", "13-City: Small": "Suburban",\
                    "21-Suburb: Large": "Suburban", "22-Suburb: Mid-size": "Suburban", "23-Suburb: Small": "Suburban",\
                    "31-Town: Fringe": "Suburban", "32-Town: Distant": "Rural", "33-Town: Remote": "Rural",\
                    "41-Rural: Fringe": "Suburban", "42-Rural: Distant": "Rural", "43-Rural: Remote": "Rural"}

alldata['Locale'] = alldata['ULOCALE'].map(locales_combined)

ruralschools = alldata[alldata['Locale'] == 'Rural']
urbanschools = alldata[alldata['Locale'] == 'Urban']
suburbanschools = alldata[alldata['Locale'] == 'Suburban']

# Set up colors, opacity, symbols, size maps for markers in all maps

color_map = {1 : 'darkmagenta', 0 : 'darkgreen'}
rural_school_color = ruralschools['Present'].map(color_map)
urban_school_color = urbanschools['Present'].map(color_map)
suburban_school_color = suburbanschools['Present'].map(color_map)

line_color_map = {1 : 'darkmagenta', 0 : 'darkgreen'}
rural_school_line_color = ruralschools['Present'].map(line_color_map)
urban_school_line_color = urbanschools['Present'].map(line_color_map)
suburban_school_line_color = suburbanschools['Present'].map(line_color_map)

opacity_map = {1 : 0.75, 0 : 0.25}
rural_school_opacity = ruralschools['Present'].map(opacity_map)
urban_school_opacity = urbanschools['Present'].map(opacity_map)
suburban_school_opacity = suburbanschools['Present'].map(opacity_map)

symbol_map = {1 : 'circle', 0 : 'circle-open'}
rural_school_symbol = ruralschools['Present'].map(symbol_map)
urban_school_symbol = urbanschools['Present'].map(symbol_map)
suburban_school_symbol = suburbanschools['Present'].map(symbol_map)

size_map = {1 : 5, 0 : 4}
rural_school_size = ruralschools['Present'].map(size_map)
urban_school_size = urbanschools['Present'].map(size_map)
suburban_school_size = suburbanschools['Present'].map(size_map)

# Set up map hover information

def custom_hover_text(row):
    if row['Present'] == 1:
        return f"<br>{str.title(row['School'])}</br>{str.title(row['City'])}, {row['State_AP']}</br>{row['Media']}<br>{row['URL']}"
    else:
        return f"<br>{str.title(row['School'])}</br>{str.title(row['City'])}, {row['State_AP']}"

# **************************************** F I R S T   R O W ****************************************

# Styling the map

fig2.add_trace(
    go.Scattergeo(lon = ruralschools['X'],
                  lat = ruralschools['Y'],
                  mode = 'markers',
                  hoverinfo = 'text',
                  hovertext = ruralschools.apply(custom_hover_text, axis=1),
                  marker = dict(color = rural_school_color, line_color = rural_school_line_color, opacity = rural_school_opacity, symbol = rural_school_symbol, size = rural_school_size),
                  textfont_family = 'Arial',
                  showlegend = False
                 ),
    row = 1, col = 1)  

# Calculate % of schools with and without SNO sites

rural_have_media_percent = round(((ruralschools['Present'] == 1).mean() * 100),1)
rural_have_not_media_percent = round(((ruralschools['Present'] == 0).mean() * 100),1)

# Styling the bar graph

fig2.add_trace(go.Bar(
    x = ['With a SNO Site'],
    y = [rural_have_media_percent],
    name = 'With a SNO Site',
    marker_color = 'darkmagenta',
    text = rural_have_media_percent,
    hoverinfo = 'none'
), row = 1, col = 2)

fig2.add_trace(go.Bar(
    x = ['Without a SNO Site'],
    y = [rural_have_not_media_percent],
    name = 'Without a SNO Site',
    marker_color = 'darkgreen',
    text = rural_have_not_media_percent,
    hoverinfo = 'none'
), row = 1, col = 2)

# **************************************** S E C O N D   R O W **************************************** 

# Styling the map

fig2.add_trace(
    go.Scattergeo(lon = urbanschools['X'],
                  lat = urbanschools['Y'],
                  mode = 'markers',
                  hoverinfo = 'text',
                  hovertext = urbanschools.apply(custom_hover_text, axis=1),
                  marker = dict(color = urban_school_color, line_color = urban_school_line_color, opacity = urban_school_opacity, symbol = urban_school_symbol, size = urban_school_size),
                  textfont_family = 'Arial',
                  showlegend = False
                 ),
    row = 6, col = 1)  

# Calculate % of schools with and without SNO sites

urban_have_media_percent = round(((urbanschools['Present'] == 1).mean() * 100),1)
urban_have_not_media_percent = round(((urbanschools['Present'] == 0).mean() * 100),1)

# Styling the bar graph

fig2.add_trace(go.Bar(
    x = ['With a SNO Site'],
    y = [urban_have_media_percent],
    name = 'With a SNO Site',
    marker_color = 'darkmagenta',
    text = urban_have_media_percent,
    hoverinfo = 'none',
    legend = 'legend2'
), row = 6, col = 2)

fig2.add_trace(go.Bar(
    x = ['Without a SNO Site'],
    y = [urban_have_not_media_percent],
    name = 'Without a SNO Site',
    marker_color = 'darkgreen',
    text = urban_have_not_media_percent,
    hoverinfo = 'none',
    legend = 'legend2'
), row = 6, col = 2)

# **************************************** T H I R D   R O W ****************************************  

# Styling the map

fig2.add_trace(
    go.Scattergeo(lon = suburbanschools['X'],
                  lat = suburbanschools['Y'],
                  mode = 'markers',
                  hoverinfo = 'text',
                  hovertext = suburbanschools.apply(custom_hover_text, axis=1),
                  marker = dict(color = suburban_school_color, line_color = suburban_school_line_color, opacity = suburban_school_opacity, symbol = suburban_school_symbol, size = suburban_school_size),
                  textfont_family = 'Arial',
                  showlegend = False
                 ),
    row = 11, col = 1)  

fig2.update_geos(
    scope = 'usa')

# Calculate % of schools with and without SNO sites

suburban_have_media_percent = round(((suburbanschools['Present'] == 1).mean() * 100),1)
suburban_have_not_media_percent = round(((suburbanschools['Present'] == 0).mean() * 100),1)

# Styling the bar graph

fig2.add_trace(go.Bar(
    x = ['With a SNO Site'],
    y = [suburban_have_media_percent],
    name = 'With a SNO Site',
    marker_color = 'darkmagenta',
    text = suburban_have_media_percent,
    hoverinfo = 'none',
    legend = 'legend3'
), row = 11, col = 2)

fig2.add_trace(go.Bar(
    x = ['Without a SNO Site'],
    y = [suburban_have_not_media_percent],
    name = 'Without a SNO Site',
    marker_color = 'darkgreen',
    text = suburban_have_not_media_percent,
    hoverinfo = 'none',
    legend = 'legend3'
), row = 11, col = 2)

# Show visualizations

# fig2.show()

In [None]:
# This cell contains code for the maps and bar graphs of majority non-White and White public schools and their SNO sites.

# Set up a two-column space for the maps and the bar graphs

fig3 = make_subplots(rows = 10, 
                     cols = 2, 
                     column_widths = [0.7, 0.3],
                     subplot_titles = ('Majority Non-White High Schools',
                                       'Percent of Majority Non-White High Schools',
                                       'Majority White High Schools',
                                       'Percent of Majority White High Schools'),
                     specs=[
                         [{'type': 'scattergeo', 'rowspan': 5}, {'type': 'bar', 'rowspan': 4}],
                         [None, None],
                         [None, None],
                         [None, None],
                         [None, None],
                         [{'type': 'scattergeo', 'rowspan': 5}, {'type': 'bar', 'rowspan': 4}],
                         [None, None],
                         [None, None],
                         [None, None],
                         [None, None]
                     ])

fig3.update_layout(
    title = 'Student Race/Ethnicity Differences In Public High Schools with and without Student Online Newspaper (SNO) Websites',
    legend = dict(
        orientation = 'h',
        yanchor = 'top',
        y = 0.95,
        xanchor= 'center',
        x = 0.30),
    legend2 = dict(
        orientation = 'h',
        yanchor = 'top',
        y = 0.45,
        xanchor= 'center',
        x = 0.30),
    height = 1400)

fig3.update_geos(
    scope = 'usa')

# Set up the base dataframe

alldata = pd.read_csv('thirdmerge_alldata.csv')

## Make adjustments to the base dataframe for the purpose of hover text in how state and URL are displayed

ap_state_abbreviations = {"AL": "Ala.", "AK": "Alaska", "AZ": "Ariz.", "AR": "Ark.", "CA": "Calif.", "CO": "Colo.", "CT": "Conn.", "DE": "Del.", "FL": "Fla.",\
    "GA": "Ga.", "HI": "Hawaii", "ID": "Idaho", "IL": "Ill.", "IN": "Ind.", "IA": "Iowa", "KS": "Kan.", "KY": "Ky.", "LA": "La.", "ME": "Maine",\
    "MD": "Md.", "MA": "Mass.", "MI": "Mich.", "MN": "Minn.", "MS": "Miss.", "MO": "Mo.", "MT": "Mont.", "NE": "Neb.", "NV": "Nev.", "NH": "N.H.",\
    "NJ": "N.J.", "NM": "N.M.", "NY": "N.Y.", "NC": "N.C.", "ND": "N.D.", "OH": "Ohio", "OK": "Okla.", "OR": "Ore.", "PA": "Pa.", "RI": "R.I.",\
    "SC": "S.C.", "SD": "S.D.", "TN": "Tenn.", "TX": "Texas", "UT": "Utah", "VT": "Vt.", "VA": "Va.", "WA": "Wash.", "WV": "W.Va.", "WI": "Wis.",\
    "WY": "Wyo."}

alldata['State_AP'] = alldata['State'].map(ap_state_abbreviations)

alldata['URL'] = alldata['URL'].str.replace(r'^https://|/$', '', regex=True)
alldata['URL'] = alldata['URL'].str.replace(r'^http://', '', regex=True)

# Set up the two 'child' dataframes

alldata['White_Percent'] = round((alldata['WH']/(alldata['AM']+alldata['AS']+alldata['BL']+alldata['HP']+alldata['HI']+alldata['TR']) * 100),1)

nonwhiteschools = alldata[alldata['White_Percent'] < 50]
whiteschools = alldata[alldata['White_Percent'] >= 50]

# Set up colors, opacity, symbols, size maps for markers in all maps

color_map = {1 : 'darkmagenta', 0 : 'darkgreen'}
nonwhite_school_color = nonwhiteschools['Present'].map(color_map)
white_school_color = whiteschools['Present'].map(color_map)

line_color_map = {1 : 'darkmagenta', 0 : 'darkgreen'}
nonwhite_school_line_color = nonwhiteschools['Present'].map(line_color_map)
white_school_line_color = whiteschools['Present'].map(line_color_map)

opacity_map = {1 : 0.75, 0 : 0.25}
nonwhite_school_opacity = nonwhiteschools['Present'].map(opacity_map)
white_school_opacity = whiteschools['Present'].map(opacity_map)

symbol_map = {1 : 'circle', 0 : 'circle-open'}
nonwhite_school_symbol = nonwhiteschools['Present'].map(symbol_map)
white_school_symbol = whiteschools['Present'].map(symbol_map)

size_map = {1 : 5, 0 : 4}
nonwhite_school_size = nonwhiteschools['Present'].map(size_map)
white_school_size = whiteschools['Present'].map(size_map)

# Set up map hover information

def custom_hover_text(row):
    if row['Present'] == 1:
        return f"<br>{str.title(row['School'])}</br>{str.title(row['City'])}, {row['State_AP']}</br>{row['Media']}<br>{row['URL']}"
    else:
        return f"<br>{str.title(row['School'])}</br>{str.title(row['City'])}, {row['State_AP']}"

# **************************************** F I R S T   R O W ****************************************

# Styling the map

fig3.add_trace(
    go.Scattergeo(lon = nonwhiteschools['X'],
                  lat = nonwhiteschools['Y'],
                  mode = 'markers',
                  hoverinfo = 'text',
                  hovertext = nonwhiteschools.apply(custom_hover_text, axis=1),
                  marker = dict(color = nonwhite_school_color, line_color = nonwhite_school_line_color, opacity = nonwhite_school_opacity, symbol = nonwhite_school_symbol, size = nonwhite_school_size),
                  textfont_family = 'Arial',
                  showlegend = False
                 ),
    row = 1, col = 1)  

# Calculate % of schools with and without SNO sites

nonwhite_have_media_percent = round(((nonwhiteschools['Present'] == 1).mean() * 100),1)
nonwhite_have_not_media_percent = round(((nonwhiteschools['Present'] == 0).mean() * 100),1)

# Styling the bar graph

fig3.add_trace(go.Bar(
    x = ['With a SNO Site'],
    y = [nonwhite_have_media_percent],
    name = 'With a SNO Site',
    marker_color = 'darkmagenta',
    text = nonwhite_have_media_percent,
    hoverinfo = 'none'
), row = 1, col = 2)

fig3.add_trace(go.Bar(
    x = ['Without a SNO Site'],
    y = [nonwhite_have_not_media_percent],
    name = 'Without a SNO Site',
    marker_color = 'darkgreen',
    text = nonwhite_have_not_media_percent,
    hoverinfo = 'none'
), row = 1, col = 2)

# **************************************** S E C O N D   R O W **************************************** 

# Styling the map

fig3.add_trace(
    go.Scattergeo(lon = whiteschools['X'],
                  lat = whiteschools['Y'],
                  mode = 'markers',
                  hoverinfo = 'text',
                  hovertext = whiteschools.apply(custom_hover_text, axis=1),
                  marker = dict(color = white_school_color, line_color = white_school_line_color, opacity = white_school_opacity, symbol = white_school_symbol, size = white_school_size),
                  textfont_family = 'Arial',
                  showlegend = False
                 ),
    row = 6, col = 1)  

# Calculate % of schools with and without SNO sites

white_have_media_percent = round(((whiteschools['Present'] == 1).mean() * 100),1)
white_have_not_media_percent = round(((whiteschools['Present'] == 0).mean() * 100),1)

# Styling the bar graph

fig3.add_trace(go.Bar(
    x = ['With a SNO Site'],
    y = [white_have_media_percent],
    name = 'With a SNO Site',
    marker_color = 'darkmagenta',
    text = white_have_media_percent,
    hoverinfo = 'none',
    legend = 'legend2'
), row = 6, col = 2)

fig3.add_trace(go.Bar(
    x = ['Without a SNO Site'],
    y = [white_have_not_media_percent],
    name = 'Without a SNO Site',
    marker_color = 'darkgreen',
    text = white_have_not_media_percent,
    hoverinfo = 'none',
    legend = 'legend2'
), row = 6, col = 2)

# Show visualizations

# fig3.show()

In [None]:
# This cell contains code for the maps and bar graphs of majority-poor and affluent public schools and SNO sites.

# Set up a two-column space for the maps and the bar graphs

fig4 = make_subplots(rows = 10, 
                     cols = 2, 
                     column_widths = [0.7, 0.3],
                     subplot_titles = ('Majority Poor High Schools',
                                       'Percent of Majority Poor High Schools',
                                       'Majority Affluent High Schools',
                                       'Percent of Majority Affluent High Schools'),
                     specs=[
                         [{'type': 'scattergeo', 'rowspan': 5}, {'type': 'bar', 'rowspan': 4}],
                         [None, None],
                         [None, None],
                         [None, None],
                         [None, None],
                         [{'type': 'scattergeo', 'rowspan': 5}, {'type': 'bar', 'rowspan': 4}],
                         [None, None],
                         [None, None],
                         [None, None],
                         [None, None]
                     ])

fig4.update_layout(
    title = 'Student Socioeconomic Differences In Public High Schools with and without Student Online Newspaper (SNO) Websites',
    legend = dict(
        orientation = 'h',
        yanchor = 'top',
        y = 0.95,
        xanchor= 'center',
        x = 0.30),
    legend2 = dict(
        orientation = 'h',
        yanchor = 'top',
        y = 0.45,
        xanchor= 'center',
        x = 0.30),
    height = 1400)

fig4.update_geos(
    scope = 'usa')

# Set up the base dataframe

alldata = pd.read_csv('thirdmerge_alldata.csv')

## Make adjustments to the base dataframe for the purpose of hover text in how state and URL are displayed

ap_state_abbreviations = {"AL": "Ala.", "AK": "Alaska", "AZ": "Ariz.", "AR": "Ark.", "CA": "Calif.", "CO": "Colo.", "CT": "Conn.", "DE": "Del.", "FL": "Fla.",\
    "GA": "Ga.", "HI": "Hawaii", "ID": "Idaho", "IL": "Ill.", "IN": "Ind.", "IA": "Iowa", "KS": "Kan.", "KY": "Ky.", "LA": "La.", "ME": "Maine",\
    "MD": "Md.", "MA": "Mass.", "MI": "Mich.", "MN": "Minn.", "MS": "Miss.", "MO": "Mo.", "MT": "Mont.", "NE": "Neb.", "NV": "Nev.", "NH": "N.H.",\
    "NJ": "N.J.", "NM": "N.M.", "NY": "N.Y.", "NC": "N.C.", "ND": "N.D.", "OH": "Ohio", "OK": "Okla.", "OR": "Ore.", "PA": "Pa.", "RI": "R.I.",\
    "SC": "S.C.", "SD": "S.D.", "TN": "Tenn.", "TX": "Texas", "UT": "Utah", "VT": "Vt.", "VA": "Va.", "WA": "Wash.", "WV": "W.Va.", "WI": "Wis.",\
    "WY": "Wyo."}

alldata['State_AP'] = alldata['State'].map(ap_state_abbreviations)

alldata['URL'] = alldata['URL'].str.replace(r'^https://|/$', '', regex=True)
alldata['URL'] = alldata['URL'].str.replace(r'^http://', '', regex=True)

# Set up the two 'child' dataframes

alldata['Lunch_Percent'] = round((alldata['TOTFRL']/alldata['TOTAL'] * 100),1)

poorschools = alldata[alldata['Lunch_Percent'] > 50]
affluentschools = alldata[alldata['Lunch_Percent'] <= 50]

# Set up colors, opacity, symbols, size maps for markers in all maps

color_map = {1 : 'darkmagenta', 0 : 'darkgreen'}
poor_school_color = poorschools['Present'].map(color_map)
affluent_school_color = affluentschools['Present'].map(color_map)

line_color_map = {1 : 'darkmagenta', 0 : 'darkgreen'}
poor_school_line_color = poorschools['Present'].map(line_color_map)
affluent_school_line_color = affluentschools['Present'].map(line_color_map)

opacity_map = {1 : 0.75, 0 : 0.25}
poor_school_opacity = poorschools['Present'].map(opacity_map)
affluent_school_opacity = affluentschools['Present'].map(opacity_map)

symbol_map = {1 : 'circle', 0 : 'circle-open'}
poor_school_symbol = poorschools['Present'].map(symbol_map)
affluent_school_symbol = affluentschools['Present'].map(symbol_map)

size_map = {1 : 5, 0 : 4}
poor_school_size = poorschools['Present'].map(size_map)
affluent_school_size = affluentschools['Present'].map(size_map)

# Set up map hover information

def custom_hover_text(row):
    if row['Present'] == 1:
        return f"<br>{str.title(row['School'])}</br>{str.title(row['City'])}, {row['State_AP']}</br>{row['Media']}<br>{row['URL']}"
    else:
        return f"<br>{str.title(row['School'])}</br>{str.title(row['City'])}, {row['State_AP']}"

# **************************************** F I R S T   R O W ****************************************

# Styling the map

fig4.add_trace(
    go.Scattergeo(lon = poorschools['X'],
                  lat = poorschools['Y'],
                  mode = 'markers',
                  hoverinfo = 'text',
                  hovertext = poorschools.apply(custom_hover_text, axis=1),
                  marker = dict(color = poor_school_color,
                                line_color = poor_school_line_color,
                                opacity = poor_school_opacity,
                                symbol = poor_school_symbol,
                                size = poor_school_size),
                  textfont_family = 'Arial',
                  showlegend = False
                 ),
    row = 1, col = 1)  

# Calculate % of schools with and without SNO sites

poor_have_media_percent = round(((poorschools['Present'] == 1).mean() * 100),1)
poor_have_not_media_percent = round(((poorschools['Present'] == 0).mean() * 100),1)

# Styling the bar graph

fig4.add_trace(go.Bar(
    x = ['With a SNO Site'],
    y = [poor_have_media_percent],
    name = 'With a SNO Site',
    marker_color = 'darkmagenta',
    text = poor_have_media_percent,
    hoverinfo = 'none'
), row = 1, col = 2)

fig4.add_trace(go.Bar(
    x = ['Without a SNO Site'],
    y = [poor_have_not_media_percent],
    name = 'Without a SNO Site',
    marker_color = 'darkgreen',
    text = poor_have_not_media_percent,
    hoverinfo = 'none'
), row = 1, col = 2)

# **************************************** S E C O N D   R O W **************************************** 

# Styling the map

fig4.add_trace(
    go.Scattergeo(lon = affluentschools['X'],
                  lat = affluentschools['Y'],
                  mode = 'markers',
                  hoverinfo = 'text',
                  hovertext = affluentschools.apply(custom_hover_text, axis=1),
                  marker = dict(color = affluent_school_color,
                                line_color = affluent_school_line_color,
                                opacity = affluent_school_opacity,
                                symbol = affluent_school_symbol,
                                size = affluent_school_size),
                  textfont_family = 'Arial',
                  showlegend = False
                 ),
    row = 6, col = 1)  

# Calculate % of schools with and without SNO sites

affluent_have_media_percent = round(((affluentschools['Present'] == 1).mean() * 100),1)
affluent_have_not_media_percent = round(((affluentschools['Present'] == 0).mean() * 100),1)

# Styling the bar graph

fig4.add_trace(go.Bar(
    x = ['With a SNO Site'],
    y = [affluent_have_media_percent],
    name = 'With a SNO Site',
    marker_color = 'darkmagenta',
    text = affluent_have_media_percent,
    hoverinfo = 'none',
    legend = 'legend2'
), row = 6, col = 2)

fig4.add_trace(go.Bar(
    x = ['Without a SNO Site'],
    y = [affluent_have_not_media_percent],
    name = 'Without a SNO Site',
    marker_color = 'darkgreen',
    text = affluent_have_not_media_percent,
    hoverinfo = 'none',
    legend = 'legend2'
), row = 6, col = 2)

# Show visualizations

# fig4.show()

# **High School News Websites: Where Are They and Where Are They Not**

## **Visualizing the Penetration of Student Newspapers Online**

#### There are approximately 22,000 public high schools in the United States. In about 8% of these high schools, student journalists publish their work using a news website hosted by the company [Student Newspapers Online](https://snosites.com/) (SNO).

#### SNO is the largest host of student news products in the United States. Looking at where schools with these websites are located and what types of schools tend to have these websites provides some insights about where student journalism tends to be supported and where it is lacking.

#### The map below shows the locations of all the public high schools in the United States, highlighting the schools with SNO websites.

In [None]:
fig1.show()

## Differences by School Location

#### Schools with SNO websites and student journalism are not distributed evenly across the country. Schools in rural areas and in urban centers are less likely to support student journalism than schools in the suburbs. 

#### The next three maps and bar graphs illustrate the discrepancies in the locations of student journalism programs.

In [None]:
fig2.show()

## Differences in Student Populations: Race / Ethnicity

#### Student population characteristics also differentiate schools that are more likely to support student journalism and thus have SNO websites from schools that tend not to do so. 

#### The combined races and ethnicities of a school's students characterize some schools as being majority White. This means that 50% or more of a school's students identify as White. Majority non-White schools are those where 50% or more of the students identify as a race or ethnicity that's not White.

#### The next two maps and bar graphs illustrate the discrepancies in the racial/ethnic makeup of schools with and without SNO websites.

In [None]:
fig3.show()

## Differences In Student Populations: Socioeconomics

#### The socioeconomics of student populations are another way to differentiate schools that are more and less likely to have student journalism and  SNO websites. 

#### The number of students in a school who are eligible to receive free or reduced-price lunch are often used in education research as a proxy for the overall socioeconmic makeup of a student body. In this project, schools in which 50% or more of the student body is eligible for free or subsidized lunch are characterized as majority poor schools. Schools in which less than 50% of the student body is eligible for free or subsidized lunch are characterized as majority affluent schools. 

#### The last two maps and bar graphs illustrate the discrepancies in the socioeconomic makeup of schools with and without SNO websites.

In [None]:
fig4.show()