-
Notifications
You must be signed in to change notification settings - Fork 1
/
datasets.html
104 lines (89 loc) · 8.17 KB
/
datasets.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
<!DOCTYPE HTML>
<html lang="en">
<head>
<title>Aron Frishberg - Datasets</title>
<meta charset="UTF-8">
<meta name="description" content="Explore a curated collection of data sets by Aron Frishberg, featuring rich insights into Nobel Prize academic affiliations, Billboard Top 100 artist labels, comprehensive US News College Rankings, and university hex colors. Each dataset, meticulously compiled and hosted on GitHub, serves as a foundational resource for academic research, data analysis projects, and educational exploration. From the dynamics of the music industry to detailed academic and color code data for top universities, Aron's datasets offer a wealth of information for programmers, researchers, and enthusiasts alike.">
<meta name="keywords" content="aronfrishberg, aron, frishberg, programming, data, analytics, smart, intelligent, projects, math, computer science, university of chicago, uchicago, business, datasets, data, Aron Frishberg, data sets, GitHub repositories, Nobel Prize affiliations, Billboard Top 100, US News College Rankings, university hex colors, educational data, music industry analytics, academic research, data analytics projects, open-source data, programming and data analysis, data science, university of Chicago projects, data visualization, academic affiliations dataset, record label dataset, college ranking data, hex color codes for universities">
<meta name="author" content="Aron Frishberg">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<link
href='https://fonts.googleapis.com/css?family=Playfair+Display:400,700,900,400italic,700italic,900italic|Droid+Serif:400,700,400italic,700italic'
rel='stylesheet' type='text/css'>
<link rel="shortcut icon" href="css/icon.ico" />
<link rel="stylesheet" href="css/index.css">
<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/6.0.0-beta3/css/all.min.css">
<script src="js/index.js"></script>
</head>
<body>
<header>
<h1>ARON FRISHBERG</h1>
</header>
<div class="links">
<a href="datasets.html" id="current-page">Datasets</a>
<a href="projects.html">Projects</a>
<a href="experience.html">Experience</a>
<a href="about.html">About</a>
</div>
<div class="date-line">
<p id="date-element"></p>
</div>
<!--articles-->
<center>
<div class="dataset-listings">
<a href="https://github.com/frishberg/Ivy-Plus-Endowment-Data">
<div class="listing">
<img src="img/dataset-thumbnails/endowments.PNG" style="float:left; margin-right: 10px; border-radius:25px;" alt="Ivy Plus Endowment Data Thumbnail">
<h2>Ivy Plus Endowment Data (2014 - 2024)</h2>
<p>For this repo, I manually went through the endowment data of all Ivy Plus schools (Ivy League + UChicago, Stanford, Duke, CalTech, Hopkins, MIT) and compiled it into a .csv and a .xlsx file. The data is from 2014 - 2024, and contains the endowment value of each school for each year. This data is useful for anyone interested in the financials of these schools, or for anyone interested in the financials of schools in general. This data is also used in my "Graph of Ivy Plus Endowments" project, which can be found on the Projects page.
</p>
</div>
</a>
<a href="https://github.com/frishberg/Record-Labels-of-the-Top-100-Artists">
<div class="listing">
<img src="img/dataset-thumbnails/billboard top 100.jpg" style="float:left; margin-right: 10px; border-radius:25px;" alt="Billboard Top 100 Dataset Thumbnail">
<h2>Record Labels of the Billboard Top 100 Artists</h2>
<p>This repo contains the current list of Billboard Top 100 Artists and their record labels, listed on Wikipedia. The scraping function runs daily through github (at midnight UTC) so the data is always current. There are some errors, as the data is scraped from Wikipedia, an openly modifiable website, but the data is accurate for the most part.
</p>
</div>
</a>
<a href="https://github.com/frishberg/Nobel-Prizes-by-Academic-Affiliations">
<div class="listing">
<img src="img/dataset-thumbnails/nobel prize data.png" style="float:left; margin-right: 10px; border-radius:25px;" alt="Nobel Prize Dataset Thumbnail">
<h2>Nobel Prizes by Academic Affiliations</h2>
<p>This repo contains the lifetime academic affiliations of all nobel laureates (1901 - 2023). Academic affiliations are divided into alma matters and institutions. For example, Moungi Bawendi (2023) recieved his PhD from UChicago and is now a professor at MIT, so both are listed in their respective categories. It should be noted that this dataset is only as accurate as wikipedia. For example, UChicago has 99 nobel prizes as of 2023, yet this dataset only contains 96, as not all laureates have all of their academic affiliations listed on wikipedia. To see this data visually displayed, check out "Universities with the Most Nobel Prizes" on Projects.
</p>
</div>
</a>
<a href="https://github.com/frishberg/Archive-of-US-News-College-Rankings">
<div class="listing">
<img src="img/dataset-thumbnails/usnews logo.png" style="float:left; margin-right: 10px; border-radius:25px;" alt="US News College Rankings Dataset Thumbnail">
<h2>Archive of U.S. News College Rankings (1984 - 2024)</h2>
<p>This repo contains an archive of all US News national college rankings from 1984 to 2024. I collected the ranking data from several sources (including publicuniversityhonors.com andyreiter.com/datasets) utiziling Python and the selenium library.
This dataset contains a total of ~6000 datapoints, and contains the data in many formats (.csv, .json, .xlsx) in order to allow for use across all languages. To see this visually displayed, check out the "Graphing Tool" on Projects.</p>
</div>
</a>
<a href="https://github.com/frishberg/University-Hex-Colors">
<div class="listing">
<img src="img/dataset-thumbnails/uchicago logo.png" style="float:left; margin-right: 10px; border-radius:25px;" alt="University Hex Colors Dataset Thumbnail">
<h2>University Hex Colors</h2>
<p>This repo contains the primary hex color codes associated with 100+ universities in the United States. I initially collected this data using OpenAI's GPT4 model, but then checked over each value by hand to ensure accuracy.
I stored the data in .json file to allow for easy use across all languages, or easy modification in the case .json is not supported by a language. This data is also used in my US News Graphing Tool to determine the color of each schools graph.
</p>
</div>
</a>
<a href="https://github.com/frishberg/US-College-Ranking-Data">
<div class="listing">
<img src="img/dataset-thumbnails/college data spreadsheet.PNG" style="float:left; margin-right: 10px; border-radius:25px;" alt="College Ranking Data Thumbnail">
<h2>U.S. College Ranking Data</h2>
<p>This repo contains data points on many of the top universities in the United States. I spent weeks coding Python programs (utilizing the selenium and requests libraries) scraping data points that range from
all Niche rankings (campus food quality, safety, professors, ...), acceptance rate, average SAT, # of nobel prizes won, international rankings, cs rankings, and more. I collected all of this data to create a custom college ranking site but figured I should publish the data as a standalone as well, for others to use.
</p>
</div>
</a>
</div>
<p id="bottom-spacer"></p>
</center>
<!--articles-->
</body>
</html>