New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some Questions About the Data #32

Open
idontgetoutmuch opened this Issue Feb 17, 2018 · 1 comment

Comments

Projects
None yet
1 participant
@idontgetoutmuch

idontgetoutmuch commented Feb 17, 2018

I have downloaded the historical dataset for old faithful: http://www.geysertimes.org/archive/geysers/Old_Faithful_eruptions.tsv.gz. I am struggling to understand what some of the columns mean.

In particular, looking at the first row,

eruptionID	geyser	eruption_time_epoch	has_seconds	exact	ns	ie	E	A	wc	ini	maj	min	q	duration	entrant	observer	eruption_comment	time_updated	time_entered	associated_primaryID	other_comments	Old_Faithful_Preplay_Time_VEC	Old_Faithful_Height_VEC
23132	Old Faithful	10506540	0	1	0	0	0	0	0	0	1	0	0	4min	BoekelUpload	OFVCL-EV		1335129843	1335129843	23132	NULL	NULL	NULL

did the dataset really begin at (eruption_time_epoch):

*Main> epochToUTC 10506540
1970-05-02 14:29:00 UTC

and if so what do the time_updated and and time_entered mean?

*Main> epochToUTC 1335129843
2012-04-22 21:24:03 UTC

Perhaps the data was collected in 1970 but only added to your excellent site in 2012?

By row 86360 the consistency(?) seems to have improved

86360	Old Faithful	1310155500	0	1	0	0	0	0	0	0	0	1	0	1m46s	BoekelUpload	OFVCL-EV	(160+ft)	1352243080	1352243080	86360	NULL	NULL	NULL

So the eruption_time_epoch is

*Main> epochToUTC 1310155500
2011-07-08 20:05:00 UTC

and the time_updated and time_entered are

*Main> epochToUTC 1352243080
2012-11-06 23:04:40 UTC

Also the number of missing entries for duration seem to have increased over time. Is there any reason for this?

I am trying to validate the dataset that is available in the R programming language https://stat.ethz.ch/R-manual/R-devel/library/datasets/html/faithful.html which I am beginning to suspect is not representative of old faithful's actual behaviour.

@idontgetoutmuch

This comment has been minimized.

idontgetoutmuch commented Feb 17, 2018

In case you are interested in what the R dataset looks like
tests

The x-axis is duration and the y-axis is gap between eruptions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment