Here you will find metadata in a computer-friendly format about Amazon Web Services' (AWS) cloud infrastructure such as:
- When a service (such as
ec2orkinesis) became available - Which services are regionless (
iam), operate in specific regions (s3) and which are dependent on availability zones within a region (ec2) - What services are available in each region and when they became available in those regions
- How many availability zones are in a region (and when there were changes)
The dataset is most easily usable as a set of the following CSV files:
aws/services.csvcontains a "main" list of services and whether they work in regions and if they are dependent on zones.aws/services-state.csvcontains information when a service was introduced, and what was its availability (limited preview / limited beta, public beta and general availability/GA).aws/services-regions.csvcontains information for services that operate in a specific region about where they are available and when they became available in the particular region.aws/zones.csvcontains information about each region and how many availability zones they contain (and when).
This is just a dataset repository. There is nothing functional (no code) in this repository. You are free to use the data freely (as long as giving me the credit for the data) as defined by the CC BY 4.0 license.
Here are samples as first few lines of each file:
==> aws/services-regions.csv <==
date,service,region,accurate,description
2004-11-03,sqs,us-east-1,1,http://aws.amazon.com/about-aws/whats-new/2004/11/03/introducing-the-amazon-simple-queue-service/
==> aws/services-state.csv <==
date,service,limited,beta,ga,accurate,description
2004-10-04,alexa,0,0,1,1,"Announcement http://aws.amazon.com/about-aws/whats-new/2004/10/04/introducing-the-alexa-web-information-service/ doesn't state beta, and earliest product page accessible https://web.archive.org/web/20060223102040/http://www.amazon.com/gp/browse.html?node=12782661 also doesn't state beta (GA). Since no contradictory information, assume GA on release."
==> aws/services.csv <==
service,name,hasregions,haszones,description
alexa,Alexa Web Information Service,0,0,http://aws.amazon.com/awis/
==> aws/zones.csv <==
date,region,zones,accurate,description
2007-10-22,us-east-1,1,1,Since zones were introduced 2008-03-27 before that there were ”no” zones – so one only.
- All files start with a header row.
- All dates are ISO 8601 format (YYYY-MM-DD). Dates only.
- Boolean values (
accurate,hasregionsandhaszonescolumns) are stored as1for a true value and0for a false value. descriptionfield is for human consumption only and contains further details such as links to announcements and rationale for judgement calls.- Zone information is included only for zones that actually available
for use as beta or GA but not for limited access (like China as
of 2014-06-23). Note that
govcloudis included because although it has limited access (US public sector only) it is generally available to all those that are allowed to use it.
The names and format of these data files may change in the future. Consider the dataset as a beta release :-)
There's also a aws/updated.csv file which contains
just one column and one data row containing one date value being the
date that this dataset is considered to be up-to-date to.
I needed this metadata for a research project (and couldn't find suitable data online), and it was quite a bit of effort (about 4-5 days of work) so I wanted to share the data in the hope this will save someone a lot of work and someone will find it useful. (I'd be delighted to hear if you do find it useful.)
To collect the data I primarily went through all news items from AWS. I used the Internet Archive's Wayback Machine a lot to check up on older versions of product pages, FAQs, infrastructure information etc. I also searched through AWS developer forum announcements. Jeff Barr's blog posts in the AWS blog were often also useful source of information.
If you spot an error in the data, please do one of:
- Edit the master version (the
.odsfile), runmaketo update CSV versions of the files and submit a pull request for those changes. (You'll needunoconvinstalled for this to work.) or - Edit the csv in GitHub's editor and send it as a pull request or
- Open an issue, or send email, or get somehow in touch with the updated information.
All dates are in ISO 8601 format (YYYY-MM-DD). Many files contain a
accurate column which should be 1 (TRUE value) if the date can be
confirmed from sources and 0 if it is interpolated from available
data.
Please note that all data must be backed up with references. Use
the description column for references and any rationale for
judgement calls.
If you want to add entirely new data (as opposed to fixing errors or
providing more accurate date information) please note that to keep the
aws/updated.csv file meaningful all changes up to the latest
date must be included in the dataset.
AWS Cloud Service Metadata by Santeri Paavolainen is licensed under a Creative Commons Attribution 4.0 International License.
