# Working with Dates and Times in Python
Run the hidden code cell below to import the data used in this course.

In [4]:
# Importing packages
import pandas as pd
import matplotlib.pyplot as plt
from datetime import date, datetime, timezone, timedelta
from dateutil import tz
import pickle

# Import datasets
rides = pd.read_csv('datasets/capital-onebike.csv')
with open('datasets/florida_hurricane_dates.pkl', 'rb') as f:
    florida_hurricane_dates = pickle.load(f)
florida_hurricane_dates = sorted(florida_hurricane_dates)

## Explore Datasets
Use the DataFrames imported in the first cell to explore the data and practice your skills!
- Count how many hurricanes made landfall each year in Florida using `florida_hurricane_dates`.
- Reload the dataset `datasets/capital-onebike.csv` so that it correctly parses date and time columns.
- Calculate the average trip duration of bike rentals on weekends in `rides`. Compare it with the average trip duration of bike rentals on weekdays.

In [7]:
# Import datetime
from datetime import datetime

# Create a datetime object
dt = datetime(2017, 12, 31, 15, 19, 13)

print(dt)


2017-12-31 15:19:13


In [8]:
# Replace the year with 1917
dt_old = dt.replace(year=1917)

# Print the results in ISO 8601 format
print(dt_old)

1917-12-31 15:19:13


# Counting events before and after noon
In this chapter, you will be working with a list of all bike trips for one Capital Bikeshare bike, W20529, from October 1, 2017 to December 31, 2017. This list has been loaded as onebike_datetimes.

Each element of the list is a dictionary with two entries: start is a datetime object corresponding to the start of a trip (when a bike is removed from the dock) and end is a datetime object corresponding to the end of a trip (when a bike is put back into a dock).

You can use this data set to understand better how this bike was used. Did more trips start before noon or after noon?

In [None]:
# Create dictionary to hold results
trip_counts = {'AM': 0, 'PM': 0}
  
# Loop over all trips
for trip in onebike_datetimes:
  # Check to see if the trip starts before noon
  if trip['start'].hour < 12:
    # Increment the counter for before noon
    trip_counts['AM'] += 1
  else:
    # Increment the counter for after noon
    trip_counts['PM'] += 1
  
print(trip_counts)

NameError: name 'onebike_datetimes' is not defined

In [None]:
<script.py> output:
    {'AM': 94, 'PM': 196}

_It looks like this bike is used about twice as much after noon than it is before noon. One obvious follow up would be to see _which_ hours the bike is most likely to be taken out for a ride._

# Turning strings into datetimes
When you download data from the Internet, dates and times usually come to you as strings. Often the first step is to turn those strings into datetime objects.

In this exercise, you will practice this transformation.

Reference	
- %Y	4 digit year (0000-9999)
- %m	2 digit month (1-12)
- %d	2 digit day (1-31)
- %H	2 digit hour (0-23)
- %M	2 digit minute (0-59)

In [10]:
# Import the datetime class
from datetime import datetime

# Starting string, in YYYY-MM-DD HH:MM:SS format
s = '2017-02-03 00:00:01'

# Write a format string to parse s
fmt = '%Y-%m-%d %H:%M:%S'

# Create a datetime object d
d = datetime.strptime(s, fmt)

# Print d
print(d)

2017-02-03 00:00:01


In [11]:
# Starting string, in YYYY-MM-DD format
s = '2030-10-15'

# Write a format string to parse s
fmt = '%Y-%m-%d'

# Create a datetime object d
d = datetime.strptime(s, fmt)

# Print d
print(d)

2030-10-15 00:00:00


In [12]:
# Starting string, in MM/DD/YYYY HH:MM:SS format
s = '12/15/1986 08:00:00'

# Write a format string to parse s
fmt = '%m/%d/%Y %H:%M:%S'

# Create a datetime object d
d = datetime.strptime(s, fmt)

# Print d
print(d)

1986-12-15 08:00:00


_Unfortunately, Python does not have the ability to parse non-zero-padded dates and times out of the box (such as 1/2/2018). If needed, you can use other string methods to create zero-padded strings suitable for strptime()._

In [None]:
# Write down the format string
fmt = "%Y-%m-%d %H:%M:%S"

# Initialize a list for holding the pairs of datetime objects
onebike_datetimes = []

# Loop over all trips
for (start, end) in onebike_datetime_strings:
  trip = {'start': datetime.strptime(start, fmt),
          'end': datetime.strptime(end, fmt)}
  
  # Append the trip
  onebike_datetimes.append(trip)

NameError: name 'onebike_datetime_strings' is not defined

# Recreating ISO format with strftime()
In the last chapter, you used strftime() to create strings from date objects. Now that you know about datetime objects, let's practice doing something similar.

Re-create the .isoformat() method, using .strftime(), and print the first trip start in our data set.

In [None]:
# Import datetime
from datetime import datetime

# Pull out the start of the first trip
first_start = onebike_datetimes[0]['start']

# Format to feed to strftime()
fmt = "%Y-%m-%dT%H:%M:%S"

# Print out date with .isoformat(), then with .strftime() to compare
print(first_start.isoformat())
print(first_start.strftime(fmt))

IndexError: list index out of range

In [None]:

<script.py> output:
    2017-10-01T15:23:25
    2017-10-01T15:23:25

# Unix timestamps
Datetimes are sometimes stored as Unix timestamps: the number of seconds since January 1, 1970. This is especially common with computer infrastructure, like the log files that websites keep when they get visitors.

In [15]:
# Import datetime
from datetime import datetime

# Starting timestamps
timestamps = [1514665153, 1514664543]

# Datetime objects
dts = []

# Loop
for ts in timestamps:
  dts.append(datetime.fromtimestamp(ts))
  
# Print results
print(dts)

[datetime.datetime(2017, 12, 30, 20, 19, 13), datetime.datetime(2017, 12, 30, 20, 9, 3)]


_The largest number that some older computers can hold in one variable is 2147483648, which as a Unix timestamp is in January 2038. On that day, many computers which haven't been upgraded will fail. Hopefully, none of them are running anything critical!_

# Turning pairs of datetimes into durations
When working with timestamps, we often want to know how much time has elapsed between events. Thankfully, we can use datetime arithmetic to ask Python to do the heavy lifting for us so we don't need to worry about day, month, or year boundaries. Let's calculate the number of seconds that the bike was out of the dock for each trip.

Continuing our work from a previous coding exercise, the bike trip data has been loaded as the list onebike_datetimes. Each element of the list consists of two datetime objects, corresponding to the start and end of a trip, respectively.

In [16]:
# Initialize a list for all the trip durations
onebike_durations = []

for trip in onebike_datetimes:
  # Create a timedelta object corresponding to the length of the trip
  trip_duration = trip['end'] - trip['start']
  
  # Get the total elapsed seconds in trip_duration
  trip_length_seconds = trip_duration.total_seconds()
  
  # Append the results to our list
  onebike_durations.append(trip_length_seconds)

_Remember that timedelta objects are represented in Python as a number of days and seconds of elapsed time. Be careful not to use .seconds on a timedelta object, since you'll just get the number of seconds without the days!_

# Average trip time
W20529 took 291 trips in our data set. How long were the trips on average? We can use the built-in Python functions sum() and len() to make this calculation.

Based on your last coding exercise, the data has been loaded as onebike_durations. Each entry is a number of seconds that the bike was out of the dock.

In [None]:
# What was the total duration of all trips?
total_elapsed_time = sum(onebike_durations)

# What was the total number of trips?
number_of_trips = len(onebike_durations)
  
# Divide the total duration by the number of trips
print(total_elapsed_time / number_of_trips)

In [None]:
<script.py> output:
    1178.9310344827586

_For the average to be a helpful summary of the data, we need for all of our durations to be reasonable numbers, and not a few that are way too big, way too small, or even malformed. For example, if there is anything fishy happening in the data, and our trip ended before it started, we'd have a negative trip length._

# The long and the short of why time is hard
Out of 291 trips taken by W20529, how long was the longest? How short was the shortest? Does anything look fishy? - şüpheli görünen bir şey var mı?

In [None]:
# Calculate shortest and longest trips
shortest_trip = min(onebike_durations)
longest_trip = max(onebike_durations)

# Print out the results
print("The shortest trip was " + str(shortest_trip) + " seconds")
print("The longest trip was " + str(longest_trip) + " seconds")

In [None]:
<script.py> output:
    The shortest trip was -3346.0 seconds
    The longest trip was 76913.0 seconds

_Weird huh?! For at least one trip, the bike returned before it left. Why could that be? Here's a hint: it happened in early November, around 2AM local time. What happens to clocks around that time each year? By the end of the next chapter, we'll have all the tools we need to deal with this situation!_

# Creating timezone aware datetimes

In [17]:
# Import datetime, timezone
from datetime import datetime, timezone

# October 1, 2017 at 15:26:26, UTC
dt = datetime(2017, 10, 1, 15, 26, 26, tzinfo=timezone.utc)

# Print results
print(dt.isoformat())

2017-10-01T15:26:26+00:00


In [18]:
# Import datetime, timedelta, timezone
from datetime import datetime, timedelta, timezone

# Create a timezone for Pacific Standard Time, or UTC-8
pst = timezone(timedelta(hours=-8))

# October 1, 2017 at 15:26:26, UTC-8
dt = datetime(2017, 10, 1, 15, 26, 26, tzinfo=pst)

# Print results
print(dt.isoformat())

2017-10-01T15:26:26-08:00


In [19]:
# Import datetime, timedelta, timezone
from datetime import datetime, timedelta, timezone

# Create a timezone for Australian Eastern Daylight Time, or UTC+11
aedt = timezone(timedelta(hours=11))

# October 1, 2017 at 15:26:26, UTC+11
dt = datetime(2017, 10, 1, 15, 26, 26, tzinfo=aedt)

# Print results
print(dt.isoformat())

2017-10-01T15:26:26+11:00


_Did you know that Russia and France are tied for the most number of time zones, with 12 each? The French mainland only has one timezone, but because France has so many overseas dependencies they really add up!_

# Setting timezones
Now that you have the hang of setting timezones one at a time, let's look at setting them for the first ten trips that W20529 took.

In [None]:
# Create a timezone object corresponding to UTC-4
edt = timezone(timedelta(hours=-4))

# Loop over trips, updating the start and end datetimes to be in UTC-4
for trip in onebike_datetimes[:10]:
  # Update trip['start'] and trip['end']
  trip['start'] = trip['start'].replace(tzinfo = edt)
  trip['end'] = trip['end'].replace(tzinfo = edt)

_Did you know that despite being over 2,500 miles (4,200 km) wide (about as wide as the continential United States or the European Union) China has only one official timezone? There's a second, unofficial timezone, too. It is used by much of the Uyghurs population in the Xinjiang province in the far west of China._

# What time did the bike leave in UTC?
Having set the timezone for the first ten rides that W20529 took, let's see what time the bike left in UTC

In [None]:
# Loop over the trips
for trip in onebike_datetimes[:10]:
  # Pull out the start
  dt = trip['start']
  # Move dt to be in UTC
  dt = dt.astimezone(timezone.utc)
  
  # Print the start time in UTC
  print('Original:', trip['start'], '| UTC:', dt.isoformat())

In [None]:
<script.py> output:
    Original: 2017-10-01 15:23:25-04:00 | UTC: 2017-10-01T19:23:25+00:00
    Original: 2017-10-01 15:42:57-04:00 | UTC: 2017-10-01T19:42:57+00:00
    Original: 2017-10-02 06:37:10-04:00 | UTC: 2017-10-02T10:37:10+00:00
    Original: 2017-10-02 08:56:45-04:00 | UTC: 2017-10-02T12:56:45+00:00
    Original: 2017-10-02 18:23:48-04:00 | UTC: 2017-10-02T22:23:48+00:00
    Original: 2017-10-02 18:48:08-04:00 | UTC: 2017-10-02T22:48:08+00:00
    Original: 2017-10-02 19:18:10-04:00 | UTC: 2017-10-02T23:18:10+00:00
    Original: 2017-10-02 19:37:32-04:00 | UTC: 2017-10-02T23:37:32+00:00
    Original: 2017-10-03 08:24:16-04:00 | UTC: 2017-10-03T12:24:16+00:00
    Original: 2017-10-03 18:17:07-04:00 | UTC: 2017-10-03T22:17:07+00:00

_Did you know that there is no official time zone at the North or South pole? Since all the lines of longitude meet each other, it's up to each traveler (or research station) to decide what time they want to use._

# Putting the bike trips into the right time zone
Instead of setting the timezones for W20529 by hand, let's assign them to their IANA timezone: 'America/New_York'. Since we know their political jurisdiction, we don't need to look up their UTC offset. Python will do that for us.

In [None]:
# Import tz
from dateutil import tz

# Create a timezone object for Eastern Time
et = tz.gettz('America/New_York')

# Loop over trips, updating the datetimes to be in Eastern Time
for trip in onebike_datetimes[:10]:
  # Update trip['start'] and trip['end']
  trip['start'] = trip['start'].replace(tzinfo = et)
  trip['end'] = trip['end'].replace(tzinfo = et)

_Time zone rules actually change quite frequently. IANA time zone data gets updated every 3-4 months, as different jurisdictions make changes to their laws about time or as more historical information about timezones are uncovered. tz is smart enough to use the date in your datetime to determine which rules to use historically._

# What time did the bike leave? (Global edition)
When you need to move a datetime from one timezone into another, use .astimezone() and tz. Often you will be moving things into UTC, but for fun let's try moving things from 'America/New_York' into a few different time zones.

In [None]:
# Create the timezone object
uk = tz.gettz('Europe/London')

# Pull out the start of the first trip
local = onebike_datetimes[0]['start']

# What time was it in the UK?
notlocal = local.astimezone(uk)

# Print them out and see the difference
print(local.isoformat())
print(notlocal.isoformat())

In [None]:
<script.py> output:
    2017-10-01T15:23:25-04:00
    2017-10-01T20:23:25+01:00

In [None]:
# Create the timezone object
ist = tz.gettz('Asia/Kolkata')

# Pull out the start of the first trip
local = onebike_datetimes[0]['start']

# What time was it in the UK?
notlocal = local.astimezone(ist)

# Print them out and see the difference
print(local.isoformat())
print(notlocal.isoformat())

In [None]:
<script.py> output:
    2017-10-01T15:23:25-04:00
    2017-10-02T00:53:25+05:30

In [None]:
# Create the timezone object
sm = tz.gettz('Pacific/Apia')

# Pull out the start of the first trip
local = onebike_datetimes[0]['start']

# What time was it in Samoa?
notlocal = local.astimezone(sm)

# Print them out and see the difference
print(local.isoformat())
print(notlocal.isoformat())

In [None]:
<script.py> output:
    2017-10-01T15:23:25-04:00
    2017-10-02T09:23:25+14:00

_Did you notice the time offset for this one? It's at UTC+14! Samoa used to be UTC-10, but in 2011 it changed to the other side of the International Date Line to better match New Zealand, its closest trading partner. However, they wanted to keep the clocks the same, so the UTC offset shifted from -10 to +14, since 24-10 is 14. Timezones... not simple!_

# How many hours elapsed around daylight saving?
Since our bike data takes place in the fall, you'll have to do something else to learn about the start of daylight savings time.

Let's look at March 12, 2017, in the Eastern United States, when Daylight Saving kicked in at 2 AM.

If you create a datetime for midnight that night, and add 6 hours to it, how much time will have elapsed?

In [None]:
# Import datetime, timedelta, tz, timezone
from datetime import datetime, timedelta, timezone
from dateutil import tz

# Start on March 12, 2017, midnight, then add 6 hours
start = datetime(2017, 3, 12, tzinfo = tz.gettz('America/New_York'))
end = start + timedelta(hours=6)
print(start.isoformat() + " to " + end.isoformat())

In [None]:
<script.py> output:
    2017-03-12T00:00:00-05:00 to 2017-03-12T06:00:00-04:00

In [None]:
# Import datetime, timedelta, tz, timezone
from datetime import datetime, timedelta, timezone
from dateutil import tz

# Start on March 12, 2017, midnight, then add 6 hours
start = datetime(2017, 3, 12, tzinfo = tz.gettz('America/New_York'))
end = start + timedelta(hours=6)
print(start.isoformat() + " to " + end.isoformat())

# How many hours have elapsed?
print((end - start).total_seconds()/(60*60))

In [None]:
<script.py> output:
    2017-03-12T00:00:00-05:00 to 2017-03-12T06:00:00-04:00
    6.0

In [None]:
# Import datetime, timedelta, tz, timezone
from datetime import datetime, timedelta, timezone
from dateutil import tz

# Start on March 12, 2017, midnight, then add 6 hours
start = datetime(2017, 3, 12, tzinfo = tz.gettz('America/New_York'))
end = start + timedelta(hours=6)
print(start.isoformat() + " to " + end.isoformat())

# How many hours have elapsed?
print((end - start).total_seconds()/(60*60))

# What if we move to UTC?
print((end.astimezone(timezone.utc) - start.astimezone(timezone.utc))\
      .total_seconds()/(60*60))

In [None]:
<script.py> output:
    2017-03-12T00:00:00-05:00 to 2017-03-12T06:00:00-04:00
    6.0
    5.0

_When we compare times in local time zones, everything gets converted into clock time. Remember if you want to get absolute time differences, always move to UTC!_

# March 29, throughout a decade
Daylight Saving rules are complicated: they're different in different places, they change over time, and they usually start on a Sunday (and so they move around the calendar).

For example, in the United Kingdom, as of the time this lesson was written, Daylight Saving begins on the last Sunday in March. Let's look at the UTC offset for March 29, at midnight, for the years 2000 to 2010.

In [None]:
# Import datetime and tz
from datetime import datetime
from dateutil import tz

# Create starting date
dt = datetime(2000, 3, 29, tzinfo = tz.gettz('Europe/London'))

# Loop over the dates, replacing the year, and print the ISO timestamp
for y in range(2000, 2011):
  print(dt.replace(year=y).isoformat())

Let's break down the solution to understand how it addresses the exercise requirements:

Importing Required Modules:

from datetime import datetime
from dateutil import tz
We import the datetime class from the datetime module to work with date and time objects.
We import the tz module from dateutil to handle time zones.
Creating the Starting Date:

dt = datetime(2000, 3, 29, tzinfo=tz.gettz('Europe/London'))
We create a datetime object dt representing March 29, 2000.
The tzinfo parameter is set using tz.gettz('Europe/London'), which assigns the 'Europe/London' time zone to the dt object.
Looping Over the Years and Replacing the Year:

for y in range(2000, 2011):
    print(dt.replace(year=y).isoformat())
We use a for loop to iterate over the years from 2000 to 2010 (inclusive). The range(2000, 2011) function generates this sequence of years.
Inside the loop, for each year y, we use the .replace() method on the dt object to create a new datetime object with the year replaced by y. The .replace() method allows us to change specific components of a datetime object while keeping the other components unchanged.
We then call the .isoformat() method on the resulting datetime object to convert it to an ISO 8601 formatted string, which includes the date, time, and time zone information.
Finally, we print the ISO formatted string.
By following these steps, the solution sets the initial date with the 'Europe/London' time zone, iterates over a range of years, replaces the year in the datetime object, and prints the ISO formatted string for each modified date. This meets all the requirements specified in the exercise instructions.

In [None]:
<script.py> output:
    2000-03-29T00:00:00+01:00
    2001-03-29T00:00:00+01:00
    2002-03-29T00:00:00+00:00
    2003-03-29T00:00:00+00:00
    2004-03-29T00:00:00+01:00
    2005-03-29T00:00:00+01:00
    2006-03-29T00:00:00+01:00
    2007-03-29T00:00:00+01:00
    2008-03-29T00:00:00+00:00
    2009-03-29T00:00:00+00:00
    2010-03-29T00:00:00+01:00

_As you can see, the rules for Daylight Saving are not trivial. When in doubt, always use tz instead of hand-rolling timezones, so it will catch the Daylight Saving rules (and rule changes!) for you._

# Finding ambiguous datetimes
At the end of lesson 2, we saw something anomalous in our bike trip duration data. Let's see if we can identify what the problem might be.

In [None]:
# Loop over trips
for trip in onebike_datetimes:
  # Rides with ambiguous start
  if tz.datetime_ambiguous(trip['start']):
    print("Ambiguous start at " + str(trip['start']))
  # Rides with ambiguous end
  if tz.datetime_ambiguous(trip['end']):
    print("Ambiguous end at " + str(trip['end']))

In [None]:
<script.py> output:
    Ambiguous start at 2017-11-05 01:56:50-04:00
    Ambiguous end at 2017-11-05 01:01:04-04:00

_Avoid ambiguous datetimes in practice by storing datetimes in UTC._

# Cleaning daylight saving data with fold
As we've just discovered, there is a ride in our data set which is being messed up by a Daylight Savings shift. Let's clean up the data set so we actually have a correct minimum ride length. We can use the fact that we know the end of the ride happened after the beginning to fix up the duration messed up by the shift out of Daylight Savings.

Since Python does not handle tz.enfold() when doing arithmetic, we must put our datetime objects into UTC, where ambiguities have been resolved.

In [None]:
trip_durations = []
for trip in onebike_datetimes:
  # When the start is later than the end, set the fold to be 1
  if trip['start'] > trip['end']:
    trip['end'] = tz.enfold(trip['end'])
  # Convert to UTC
  start = trip['start'].astimezone(tz.UTC)
  end = trip['end'].astimezone(tz.UTC)

  # Subtract the difference
  trip_length_seconds = (end-start).total_seconds()
  trip_durations.append(trip_length_seconds)

# Take the shortest trip duration
print("Shortest trip: " + str(min(trip_durations)))

<script.py> output:
    Shortest trip: 116.0

_Now you know how to handle some pretty gnarly edge cases in datetime data. To give a sense for how tricky these things are: we actually still don't know how long the rides are which only started or ended in our ambiguous hour but not both. If you're collecting data, store it in UTC or with a fixed UTC offset!_

# Loading a csv file in Pandas
The capital_onebike.csv file covers the October, November and December rides of the Capital Bikeshare bike W20529.

Here are the first two columns:

Start date	End date	...
- 2017-10-01 15:23:25	2017-10-01 15:26:26	...
- 2017-10-01 15:42:57	2017-10-01 17:49:59	...


In [None]:
# Import pandas
import pandas as pd

# Load CSV into the rides variable
rides = pd.read_csv('capital-onebike.csv', 
                    parse_dates = ['Start date', 'End date'])

# Print the initial (0th) row
print(rides.iloc[0])

In [None]:
<script.py> output:
    Start date                        2017-10-01 15:23:25
    End date                          2017-10-01 15:26:26
    Start station number                            31038
    Start station                    Glebe Rd & 11th St N
    End station number                              31036
    End station             George Mason Dr & Wilson Blvd
    Bike number                                    W20529
    Member type                                    Member
    Name: 0, dtype: object

_Did you know that pandas has a `pd.read_excel()`, `pd.read_json()`, and even a `pd.read_clipboard()` function to read tabular data that you've copied from a document or website? Most have date parsing functionality too._

# Making timedelta columns
Earlier in this course, you wrote a loop to subtract datetime objects and determine how long our sample bike had been out of the docks. Now you'll do the same thing with Pandas.

In [None]:
# Subtract the start date from the end date
ride_durations = rides['End date'] - rides['Start date']

# Convert the results to seconds
rides['Duration'] = ride_durations.dt.total_seconds()

print(rides['Duration'].head())

In [None]:
<script.py> output:
    0     181.0
    1    7622.0
    2     343.0
    3    1278.0
    4    1277.0
    Name: Duration, dtype: float64

_Because Pandas supports method chaining, you could also perform this operation in one line_`rides['Duration'] = (rides['End date'] - rides['Start date']).dt.total_seconds()`

# How many joyrides?
Suppose you have a theory that some people take long bike rides before putting their bike back in the same dock. Let's call these rides "joyrides".

You only have data on one bike, so while you can't draw any bigger conclusions, it's certainly worth a look.

Are there many joyrides? How long were they in our data set? Use the median instead of the mean, because we know there are some very long trips in our data set that might skew the answer, and the median is less sensitive to outliers.

In [None]:
# Create joyrides
joyrides = (rides['Start station'] == rides['End station'])

# Total number of joyrides
print("{} rides were joyrides".format(joyrides.sum()))

# Median of all rides
print("The median duration overall was {:.2f} seconds"\
      .format(rides['Duration'].median()))

# Median of joyrides
print("The median duration for joyrides was {:.2f} seconds"\
      .format(rides[joyrides]['Duration'].median()))

In [None]:
<script.py> output:
    6 rides were joyrides
    The median duration overall was 660.00 seconds
    The median duration for joyrides was 2642.50 seconds

_Pandas makes analyses like these concise to write and reason about. Writing this as a for loop would have been more complex._

# It's getting cold outside, W20529
Washington, D.C. has mild weather overall, but the average high temperature in October (68ºF / 20ºC) is certainly higher than the average high temperature in December (47ºF / 8ºC). People also travel more in December, and they work fewer days so they commute less.

How might the weather or the season have affected the length of bike trips?

In [None]:
# Import matplotlib
import matplotlib.pyplot as plt

# Resample rides to daily, take the size, plot the results
rides.resample('D', on = 'Start date')\
  .size()\
  .plot(ylim = [0, 15])

# Show the results
plt.show()

![resim_2024-08-01_184037965](resim_2024-08-01_184037965.png)


In [None]:
# Import matplotlib
import matplotlib.pyplot as plt

# Resample rides to monthly, take the size, plot the results
rides.resample('M', on = 'Start date')\
  .size()\
  .plot(ylim = [0, 150])

# Show the results
plt.show()

![resim_2024-08-01_184216839](resim_2024-08-01_184216839.png)


_As you can see, the pattern is clearer at the monthly level: there were fewer rides in November, and then fewer still in December, possibly because the temperature got colder_.

# Members vs casual riders over time
Riders can either be "Members", meaning they pay yearly for the ability to take a bike at any time, or "Casual", meaning they pay at the kiosk attached to the bike dock.

Do members and casual riders drop off at the same rate over October to December, or does one drop off faster than the other?

As before, rides has been loaded for you. You're going to use the Pandas method .value_counts(), which returns the number of instances of each value in a Series. In this case, the counts of "Member" or "Casual".

In [None]:
# Resample rides to be monthly on the basis of Start date
monthly_rides = rides.resample('M', on = 'Start date')['Member type']

# Take the ratio of the .value_counts() over the total number of rides
print(monthly_rides.value_counts() / monthly_rides.size())

In [None]:
<script.py> output:
    Start date  Member type
    2017-10-31  Member         0.769
                Casual         0.231
    2017-11-30  Member         0.825
                Casual         0.175
    2017-12-31  Member         0.861
                Casual         0.139
    Name: Member type, dtype: float64

_Note that by default, .resample() labels Monthly resampling with the last day in the month and not the first. It certainly looks like the fraction of Casual riders went down as the number of rides dropped. With a little more digging, you could figure out if keeping Member rides only would be enough to stabilize the usage numbers throughout the fall._

# Combining groupby() and resample()
A very powerful method in Pandas is .groupby(). Whereas .resample() groups rows by some time or date information, .groupby() groups rows based on the values in one or more columns. For example, rides.groupby('Member type').size() would tell us how many rides there were by member type in our entire DataFrame.

.resample() can be called after .groupby(). For example, how long was the median ride by month, and by Membership type?

In [None]:
# Group rides by member type, and resample to the month
grouped = rides.groupby('Member type')\
  .resample('M', on = 'Start date')

# Print the median duration for each group
print(grouped['Duration'].median())

In [None]:
<script.py> output:
    Member type  Start date
    Casual       2017-10-31    1636.0
                 2017-11-30    1159.5
                 2017-12-31     850.0
    Member       2017-10-31     671.0
                 2017-11-30     655.0
                 2017-12-31     387.5
    Name: Duration, dtype: float64

_It looks like casual riders consistently took longer rides, but that both groups took shorter rides as the months went by. Note that, by combining grouping and resampling, you can answer a lot of questions about nearly any data set that includes time as a feature. Keep in mind that you can also group by more than one column at once._

# Timezones in Pandas

In [None]:
# Localize the Start date column to America/New_York
rides['Start date'] = rides['Start date'].dt.tz_localize('America/New_York', 
                                						 ambiguous='NaT')

# Print first value
print(rides['Start date'].iloc[0])

In [None]:
<script.py> output:
    2017-10-01 15:23:25-04:00

In [None]:
# Localize the Start date column to America/New_York
rides['Start date'] = rides['Start date'].dt.tz_localize('America/New_York', 
                                						 ambiguous='NaT')

# Print first value
print(rides['Start date'].iloc[0])

# Convert the Start date column to Europe/London
rides['Start date'] = rides['Start date'].dt.tz_convert('Europe/London')

# Print the new value
print(rides['Start date'].iloc[0])

In [None]:
<script.py> output:
    2017-10-01 15:23:25-04:00
    2017-10-01 20:23:25+01:00

_`dt.tz_convert()` converts to a new timezone, whereas `dt.tz_localize()` sets a timezone in the first place. You now know how to deal with datetimes in Pandas._

# How long per weekday?
Pandas has a number of datetime-related attributes within the .dt accessor. Many of them are ones you've encountered before, like .dt.month. Others are convenient and save time compared to standard Python, like .dt.day_name().

In [None]:
# Add a column for the weekday of the start of the ride
rides['Ride start weekday'] = rides['Start date'].dt.day_name()

# Print the median trip time per weekday
print(rides.groupby('Ride start weekday')['Duration'].median())

In [None]:
<script.py> output:
    Ride start weekday
    Friday       724.5
    Monday       810.5
    Saturday     462.0
    Sunday       902.5
    Thursday     652.0
    Tuesday      641.5
    Wednesday    585.0
    Name: Duration, dtype: float64

_There are `.dt` attributes for all of the common things you might want to pull out of a datetime, such as the day, month, year, hour, and so on, and also some additional convenience ones, such as quarter and week of the year out of 52._

# How long between rides?
For your final exercise, let's take advantage of Pandas indexing to do something interesting. How much time elapsed between rides?

In [None]:
# Shift the index of the end date up one; now subract it from the start date
rides['Time since'] = rides['Start date'] - (rides['End date'].shift(1))

# Move from a timedelta to a number of seconds, which is easier to work with
rides['Time since'] = rides['Time since'].dt.total_seconds()

# Resample to the month
monthly = rides.resample('M', on = 'Start date')

# Print the average hours between rides each month
print(monthly['Time since'].mean()/(60*60))

In [None]:
<script.py> output:
    Start date
    2017-10-31    5.519
    2017-11-30    7.256
    2017-12-31    9.202
    Freq: M, Name: Time since, dtype: float64

# Özet: Tarihler ve Takvimler
1. Bölüm: Tarihler
Bu kursun ilk bölümünde Python'daki tarihler üzerinde durduk. date() sınıfı yıl, ay ve gün argümanlarını alır. Bir tarih nesnesinin year gibi erişim özellikleri ve weekday() gibi metotları vardır. Tarih nesneleri sayılar gibi karşılaştırılabilir, min(), max() ve sort() kullanılabilir. Bir tarihten diğerini çıkararak timedelta elde edebilirsiniz. Tarih nesnelerini stringe çevirmek için isoformat() veya strftime() metotlarını kullanın.

2. Bölüm: Tarih ve Zamanları Birleştirmek
Bu kursun ikinci bölümünde tarih ve zamanları ele aldık. datetime() sınıfı date()'in tüm argümanlarını alır, ayrıca saat, dakika, saniye ve mikro saniye ekler. Tüm ek argümanlar opsiyoneldir, aksi takdirde varsayılan olarak sıfıra ayarlanır. datetime içindeki herhangi bir değeri replace() metotuyla değiştirebilirsiniz. timedelta'ı tamsayıya çevirmek için total_seconds() metodunu kullanın. Stringleri tarihe çevirmek için strptime(), tarihleri stringe çevirmek için strftime() kullanın.

3. Bölüm: Zaman Dilimleri ve Yaz Saati
Bu kursun üçüncü bölümünde zaman dilimleri ve yaz saati üzerinde durduk. Bir datetime nesnesi tzinfo ayarlıysa "zaman dilimi bilincinde", değilse "zaman dilimi bilgisiz"dir. Zaman dilimi ayarlamak, bir datetime nesnesinin evrensel zaman standardı olan UTC ile nasıl uyumlu olacağını belirtir. Bir datetime nesnesinin zaman dilimini değiştirmek için replace() metodunu kullanın, tarih ve saati aynı bırakın. Tarih ve saati yeni zaman dilimine uyacak şekilde ayarlamak için astimezone() metodunu kullanın. dateutil-dot-tz kapsamlı ve güncel bir zaman dilimi veritabanı sağlar.

4. Bölüm: Pandas ile Kolay ve Güçlü Zaman Damgaları
Bu kursun dördüncü ve son bölümünde tarihler ve zamanları işlemek için Pandas kullanımı üzerinde durduk. Bir csv dosyası okurken, parse_dates argümanını tarih ve zaman olarak ayrıştırılacak sütunların listesine ayarlayın. parse_dates işe yaramazsa pd-dot-to_datetime() fonksiyonunu kullanın. Satırları groupby() ile gruplandırmak, grup başına toplamlar hesaplamanıza izin verir. Örneğin, first(), min() veya mean(). resample() satırları yıl, ay, gün vb. bazında bir tarih ve zaman sütununa göre gruplandırır. Zaman dilimini ayarlamak ve tarih ve saati aynı tutmak için tz_localize() kullanın. Tarih ve saati yeni zaman dilimine uyacak şekilde değiştirmek için tz_convert() kullanın.