Skip to content

Commit

Permalink
Use to_datetime() in Unix timestamp example instead of .astype('datet…
Browse files Browse the repository at this point in the history
…ime64'). Fixes #2.
  • Loading branch information
jvns committed Jan 3, 2014
1 parent 67211c4 commit ad59b22
Showing 1 changed file with 32 additions and 138 deletions.
170 changes: 32 additions & 138 deletions cookbook/Chapter 8 - How to deal with timestamps.ipynb
Expand Up @@ -166,17 +166,15 @@
"source": [
"Every numpy array and pandas series has a dtype -- this is usually `int64`, `float64`, or `object`. Some of the time types available are `datetime64[s]`, `datetime64[ms]`, and `datetime64[us]`. There are also `timedelta` types, similarly.\n",
"\n",
"Normally you can convert between dtypes using the `astype()` method, but for some reason that throws an error for me right now.\n",
"\n",
"Changing the dtype manually seems to work:"
"We can use the `pd.to_datetime` function to convert our integer timestamps into datetimes. This is a constant-time operation -- we're not actually changing any of the data, just how pandas thinks about it."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"popcon['atime'].dtype = dtype('datetime64[s]')\n",
"popcon['ctime'].dtype = dtype('datetime64[s]')"
"popcon['atime'] = pd.to_datetime(popcon['atime'], unit='s')\n",
"popcon['ctime'] = pd.to_datetime(popcon['ctime'], unit='s')"
],
"language": "python",
"metadata": {},
Expand All @@ -187,87 +185,24 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Huh, this is weird! The dates still aren't showing up properly."
"If we look at the dtype now, it's `<M8[ns]`. As far as I can tell `M8` is secret code for `datetime64`."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"popcon[:5]"
"popcon['atime'].dtype"
],
"language": "python",
"metadata": {},
"outputs": [
{
"html": [
"<div style=\"max-height:1000px;max-width:1500px;overflow:auto;\">\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>atime</th>\n",
" <th>ctime</th>\n",
" <th>package-name</th>\n",
" <th>mru-program</th>\n",
" <th>tag</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td> 1387295797</td>\n",
" <td> 1367633260</td>\n",
" <td> perl-base</td>\n",
" <td> /usr/bin/perl</td>\n",
" <td> NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td> 1387295796</td>\n",
" <td> 1354370480</td>\n",
" <td> login</td>\n",
" <td> /bin/su</td>\n",
" <td> NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td> 1387295743</td>\n",
" <td> 1354341275</td>\n",
" <td> libtalloc2</td>\n",
" <td> /usr/lib/x86_64-linux-gnu/libtalloc.so.2.0.7</td>\n",
" <td> NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td> 1387295743</td>\n",
" <td> 1387224204</td>\n",
" <td> libwbclient0</td>\n",
" <td> /usr/lib/x86_64-linux-gnu/libwbclient.so.0</td>\n",
" <td> &lt;RECENT-CTIME&gt;</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td> 1387295742</td>\n",
" <td> 1354341253</td>\n",
" <td> libselinux1</td>\n",
" <td> /lib/x86_64-linux-gnu/libselinux.so.1</td>\n",
" <td> NaN</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"metadata": {},
"output_type": "pyout",
"prompt_number": 6,
"text": [
" atime ctime package-name mru-program tag\n",
"0 1387295797 1367633260 perl-base /usr/bin/perl NaN\n",
"1 1387295796 1354370480 login /bin/su NaN\n",
"2 1387295743 1354341275 libtalloc2 /usr/lib/x86_64-linux-gnu/libtalloc.so.2.0.7 NaN\n",
"3 1387295743 1387224204 libwbclient0 /usr/lib/x86_64-linux-gnu/libwbclient.so.0 <RECENT-CTIME>\n",
"4 1387295742 1354341253 libselinux1 /lib/x86_64-linux-gnu/libselinux.so.1 NaN"
"dtype('<M8[ns]')"
]
}
],
Expand All @@ -277,17 +212,14 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Copying the dataframe in this way seems to fix that, though. I really don't know what's going on here.\n",
"\n",
"This is kind of typical of my experience with pandas, actually -- mostly it works great, but sometimes it doesn't work in strange ways that I don't understand."
"So now we can look at our `atime` and `ctime` as dates!"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"popcon = pd.concat([popcon[col] for col in popcon.columns], axis=1)\n",
"popcon[:10]"
"popcon[:5]"
],
"language": "python",
"metadata": {},
Expand All @@ -311,80 +243,40 @@
" <th>0</th>\n",
" <td>2013-12-17 15:56:37</td>\n",
" <td>2013-05-04 02:07:40</td>\n",
" <td> perl-base</td>\n",
" <td> /usr/bin/perl</td>\n",
" <td> perl-base</td>\n",
" <td> /usr/bin/perl</td>\n",
" <td> NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>2013-12-17 15:56:36</td>\n",
" <td>2012-12-01 14:01:20</td>\n",
" <td> login</td>\n",
" <td> /bin/su</td>\n",
" <td> login</td>\n",
" <td> /bin/su</td>\n",
" <td> NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>2013-12-17 15:55:43</td>\n",
" <td>2012-12-01 05:54:35</td>\n",
" <td> libtalloc2</td>\n",
" <td> /usr/lib/x86_64-linux-gnu/libtalloc.so.2.0.7</td>\n",
" <td> libtalloc2</td>\n",
" <td> /usr/lib/x86_64-linux-gnu/libtalloc.so.2.0.7</td>\n",
" <td> NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>2013-12-17 15:55:43</td>\n",
" <td>2013-12-16 20:03:24</td>\n",
" <td> libwbclient0</td>\n",
" <td> /usr/lib/x86_64-linux-gnu/libwbclient.so.0</td>\n",
" <td> libwbclient0</td>\n",
" <td> /usr/lib/x86_64-linux-gnu/libwbclient.so.0</td>\n",
" <td> &lt;RECENT-CTIME&gt;</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>2013-12-17 15:55:42</td>\n",
" <td>2012-12-01 05:54:13</td>\n",
" <td> libselinux1</td>\n",
" <td> /lib/x86_64-linux-gnu/libselinux.so.1</td>\n",
" <td> NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>2013-12-17 15:55:42</td>\n",
" <td>2012-12-01 05:54:35</td>\n",
" <td> libstdc++6</td>\n",
" <td> /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.16</td>\n",
" <td> NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>2013-12-17 15:55:40</td>\n",
" <td>2013-12-16 20:03:22</td>\n",
" <td> libpam-winbind</td>\n",
" <td> /lib/x86_64-linux-gnu/security/pam_winbind.so</td>\n",
" <td> &lt;RECENT-CTIME&gt;</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>2013-12-17 15:55:40</td>\n",
" <td>2012-12-01 05:54:13</td>\n",
" <td> libpam-modules</td>\n",
" <td> /lib/x86_64-linux-gnu/security/pam_unix.so</td>\n",
" <td> NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>2013-12-17 15:55:40</td>\n",
" <td>2012-12-01 05:54:13</td>\n",
" <td> libpam-ck-connector</td>\n",
" <td> /lib/security/pam_ck_connector.so</td>\n",
" <td> NaN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td>2013-12-17 15:55:40</td>\n",
" <td>2012-12-01 05:54:13</td>\n",
" <td> libpam-cap</td>\n",
" <td> /lib/x86_64-linux-gnu/security/pam_cap.so</td>\n",
" <td> libselinux1</td>\n",
" <td> /lib/x86_64-linux-gnu/libselinux.so.1</td>\n",
" <td> NaN</td>\n",
" </tr>\n",
" </tbody>\n",
Expand All @@ -395,17 +287,12 @@
"output_type": "pyout",
"prompt_number": 7,
"text": [
" atime ctime package-name mru-program tag\n",
"0 2013-12-17 15:56:37 2013-05-04 02:07:40 perl-base /usr/bin/perl NaN\n",
"1 2013-12-17 15:56:36 2012-12-01 14:01:20 login /bin/su NaN\n",
"2 2013-12-17 15:55:43 2012-12-01 05:54:35 libtalloc2 /usr/lib/x86_64-linux-gnu/libtalloc.so.2.0.7 NaN\n",
"3 2013-12-17 15:55:43 2013-12-16 20:03:24 libwbclient0 /usr/lib/x86_64-linux-gnu/libwbclient.so.0 <RECENT-CTIME>\n",
"4 2013-12-17 15:55:42 2012-12-01 05:54:13 libselinux1 /lib/x86_64-linux-gnu/libselinux.so.1 NaN\n",
"5 2013-12-17 15:55:42 2012-12-01 05:54:35 libstdc++6 /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.16 NaN\n",
"6 2013-12-17 15:55:40 2013-12-16 20:03:22 libpam-winbind /lib/x86_64-linux-gnu/security/pam_winbind.so <RECENT-CTIME>\n",
"7 2013-12-17 15:55:40 2012-12-01 05:54:13 libpam-modules /lib/x86_64-linux-gnu/security/pam_unix.so NaN\n",
"8 2013-12-17 15:55:40 2012-12-01 05:54:13 libpam-ck-connector /lib/security/pam_ck_connector.so NaN\n",
"9 2013-12-17 15:55:40 2012-12-01 05:54:13 libpam-cap /lib/x86_64-linux-gnu/security/pam_cap.so NaN"
" atime ctime package-name mru-program tag\n",
"0 2013-12-17 15:56:37 2013-05-04 02:07:40 perl-base /usr/bin/perl NaN\n",
"1 2013-12-17 15:56:36 2012-12-01 14:01:20 login /bin/su NaN\n",
"2 2013-12-17 15:55:43 2012-12-01 05:54:35 libtalloc2 /usr/lib/x86_64-linux-gnu/libtalloc.so.2.0.7 NaN\n",
"3 2013-12-17 15:55:43 2013-12-16 20:03:24 libwbclient0 /usr/lib/x86_64-linux-gnu/libwbclient.so.0 <RECENT-CTIME>\n",
"4 2013-12-17 15:55:42 2012-12-01 05:54:13 libselinux1 /lib/x86_64-linux-gnu/libselinux.so.1 NaN"
]
}
],
Expand All @@ -415,7 +302,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Okay, great! So now I can just look at packages that aren't libraries!\n",
"Now suppose we want to look at all packages that aren't libraries.\n",
"\n",
"First, I want to get rid of everything with timestamp 0. Notice how we can just use a string in this comparison, even though it's actually a timestamp on the inside? That is because pandas is amazing."
]
Expand All @@ -431,6 +318,13 @@
"outputs": [],
"prompt_number": 8
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now we can use pandas' magical string abilities to just look at rows where the package name doesn't contain 'lib'."
]
},
{
"cell_type": "code",
"collapsed": false,
Expand Down

0 comments on commit ad59b22

Please sign in to comment.