### Overview

At a high-level, we'll break down this project into a few manageable tasks.

In [None]:
    Task 0: Inspect the data. (data/neos.csv and data/cad.json). You already did this!
    Task 1: Build models to represent the data. (models.py)
    Task 2: Extract the data into a custom database (extract.py and database.py)
    Task 3: Create filters to query the database to generate a stream of matching CloseApproach objects, and limit the result size. (filters.py and database.py)
    Task 4: Save the data to a file. (write.py)

As you implement these tasks, you'll unlock more and more functionality. When Task 2 is complete, you'll be able to run the inspect subcommand. When Task 3 is complete, you'll be able to run the query subcommand without the --outfile argument. When Task 4 is complete, you'll be able to run everything.

Remember, in this project you won't need to write any code that prompts the user for input - the main.py script will accept arguments from the command line or the interactive session and pass that information to the appropriate Python classes and functions that you create.

### Task 1: Design the objects that will store our data.

Well done! Now that we understand the project overview and our data set, it's time to start coding. The first thing we'll do is create Python objects to represent our data. In particular, we're going to create two classes in the models.py file:

    A NearEarthObject class, to represent the data for a single near-Earth object.
    A CloseApproach class, to represent the data for a single close approach of an NEO.

In doing so, we'll have to decide how to construct new instances of this class, which attributes from our dataset belong to each object, how to build a human-readable representation this object, and which additional methods or properties, if any, we want to include. We'll also have to plan for how these objects will interact with each other.
Designing the NearEarthObject class

The models.py file contains a starting template for the NearEarthObject class. This class object will represent a single near-Earth object.

In [None]:
class NearEarthObject:
    def __init__(self, ...):
        ...

    def __str__(self):
        ...

In [None]:
The __init__ method is the constructor for the class. You will need to decide what arguments it should accept. If you make changes, you should also update the surrounding comments.

The __str__ method will return a human-readable string that captures the contents of the class for a human audience. In contrast, the prewritten __repr__ method is stylized to be machine-readable.

Each NearEarthObject must have attributes (or gettable properties) for the following names:

    designation: The primary designation for this NearEarthObject.
    name: The IAU name for this NearEarthObject.
    diameter: The diameter, in kilometers, of this NearEarthObject.
    hazardous: Whether or not this NearEarthObject is potentially hazardous.
    approaches: A collection of this NearEarthObjects close approaches to Earth.

The starter code contains default values for some of these attributes - you should decide how, and if, to replace that code.

Recall that, even though every NEO in the data set has a nonempty primary designation, some NEOs have no name, and some NEOs have no diameter (it's unknown to NASA).

The designation should resolve to a string, the name should resolve to either a nonempty string or the value None, the diameter should resolve to a float (you should use float('nan') to represent an undefined diameter), and the hazardous flag should resolve to a boolean.

The approaches attribute, for now, can be an empty collection. In Task 2, you'll use the real data set to populate this collection with the real CloseApproach data.

The \_\_str__ method that you write is up to you - it'll determine how this object is printed, and should be human-readable. For inspiration, we adopted the following format:

In [None]:
>>> neo = ...
>>> print(neo)
NEO {fullname} has a diameter of {diameter:.3f} km and [is/is not] potentially hazardous.
>>> halley = ...
>>> print(halley)
NEO 433 (Eros) has a diameter of 16.840 km and is not potentially hazardous.

In the above, {fullname} is either {designation} ({name}) if the name exists or simply {designation} otherwise. As a hint, this is a great opportunity for a property named fullname!
Designing the CloseApproach class

The models.py file also contains a starting template for the CloseApproach class. This class object will represent a single close approach to Earth by a near-Earth object.

In [None]:
class CloseApproach:
    def __init__(self, ...):
        ...

    def __str__(self):
        ...

The \_\_init__ method is the constructor for the class. You will need to decide what arguments it should accept. If you make changes, you should also update the surrounding comments.

The \_\_str__ method will return a human-readable string that captures the contents of the class for a human audience. In contrast, the prewritten \_\_repr__ method is stylized to be machine-readable.

Each CloseApproach must have attributes (or gettable properties) for the following names:

    time: The date and time, in UTC, at which the NEO passes closest to Earth.
    distance: The nominal approach distance, in astronomical units, of the NEO to Earth at the closest point.
    velocity: The velocity, in kilometers per second, of the NEO relative to Earth at the closest point.
    neo: The NearEarthObject that is making a close approach to Earth.

The date should resolve to a Python datetime, the distance should resolve to a float, and the velocity should resolve to a float.

The neo attribute, for now, can be None. In its absence, you should include a _designation attribute with the primary designation of the close approach's NEO. In Task 2, you'll use the real data set and this _designation attribute to connect the neo attribute to a real NearEarthObject instance.

You can use the cd_to_datetime function in the helpers module to convert a calendar date from the format provided in cad.json (e.g. "1900-Jan-01 00:00") into a Python datetime object.

The \_\_str__ method that you write is up to you - it'll determine how this object is printed, and should be human-readable. For inspiration, we adopted the following format:

In [None]:
>>> ca = ...
>>> print(ca)
At {time_str}, '{neo.fullname}' approaches Earth at a distance of {distance:.2f} au and a velocity of {velocity:.2f} km/s.
>>> halley_approach = ...
>>> print(halley_approach)
On 1910-05-20 12:49, '1P (Halley)' approaches Earth at a distance of 0.15 au and a velocity of 70.56 km/s.

You should use the datetime_to_str function from the helpers module to format the time attribute to a string without seconds. This is another great opportunity for a property!

Testing

Make sure to manually test your implementation at an interactive interpreter. Your interactive session might look something like:

In [None]:
$ python3 -q
>>> from models import NearEarthObject, CloseApproach
>>> neo = NearEarthObject(...)  # Use any sample data here.
>>> print(neo.designation)
2020 FK
>>> print(neo.name)
One REALLY BIG fake asteroid
>>> print(neo.diameter)
12.345
>>> print(neo.hazardous)
True
>>> print(neo)
NEO 2020 FK (One REALLY BIG fake asteroid) has a diameter of 12.345 km and is potentially hazardous.
>>> ca = CloseApproach(...)  # Use any sample data here.
>>> print(type(ca.time))
datetime.datetime
>>> print(ca.time_str)
2020-01-01 12:30
>>> print(ca.distance)
0.25
>>> print(ca.velocity)
56.78
>>> print(ca)
On 2020-01-01 12:30, '2020 FK (One REALLY BIG fake asteroid)' approaches Earth at a distance of 0.25 au and a velocity of 56.78 km/s.

As you progress the remaining tasks, you may have to revisit this file to adapt your implementation - that's expected!

Rubric Tip:

Check if you have error-handling code for the case in which an NEO has no name or no diameter. If there’s no name, the name attribute should be None. If there’s no diameter, the diameter attribute should probably be float(‘nan’).
Task 2: Extract data from structured files into Python objects.

Wonderful! Now that we've defined Python objects in models.py that can represent our data, let's extract the real data from our data sets.

For this task, we'll make changes in two files:

    In extract.py, we'll write functions that takes the paths to our data files and extract structured data.
    In database.py, we'll capture this data in an NEODatabase, precompute auxiliary data structures, interconnect the NearEarthObjects and CloseApproaches, and provide the ability to fetch NEOs by designation or by name.

### Task 2a: Extract data from data files.

In the extract.py file, you'll implement the load_neos and load_approaches functions:


In [None]:
def load_neos(neo_csv_path):
    ...
    return a collection of `NearEarthObject` instances.

def load_approaches(cad_json_path):
    ...
    return a collection of `CloseApproach` instances.

The neo_csv_path and cad_json_path arguments are Path-like objects corresponding either to the default data/neos.csv and data/cad.json or to some alternate location specifed by the user at the command line. You can open(neo_csv_path) or open(cad_json_path) as usual.

In this module, you'll have to use the built-in csv and json modules. You'll also need to rely on the NearEarthObject and CloseApproach classes you defined in Task 1, which you could end up adapting if needed.

The collections returned by load_neos and load_approaches are then used by the main.py script to create an NEODatabase.

### Task 2b: Encapsulate the data in a NEODatabase.

In the database.py file, you'll implement the \_\_init__ constructor of the NEODatabase object and finish the get_neo_by_designation and get_neo_by_name methods. At the start, the NEODatabase class looks like:


In [None]:
class NEODatabase:
    def __init__(self, neos, approaches):
        ...
    def get_neo_by_designation(self, designation):
        ...
    def get_neo_by_name(self, name):
        ...

The neos and approaches arguments provided to the NEODatabase constructor are exactly the objects produced by the load_neos and load_approaches functions of the extract module.

In the NEODatabase constructor, you must connect together the collection of NearEarthObjects and the collection of CloseApproaches. Specifically, for each close approach, you should determine to which NEO its _designation corresponds, and assign that NearEarthObject to the CloseApproach's .neo attribute (which we set to None in Task 1). Additionally, you should add this close approach to the NearEarthObject's .approaches attribute, which represents a collection of CloseApproaches (which we initialized to an empty collection in Task 1).

In addition to storing the newly-connected NEOs and close approaches, you'll likely want to precompute some helpful auxiliary data structures that can speed up the get_neo_by_designation and get_neo_by_name methods. If you loop over every known NEO in those methods, the resulting code will be unnecessarily slow. What additional data structures can we attach to the NEODatabase that can assist with these methods?

Both the get_neo_by_designation and get_neo_by_name methods should return None if a matching NEO wasn't found in the database. For get_neo_by_name, in no case should the empty string nor the None singleton be associated to an NEO. Furthermore, in the relatively rare case that there are multiple NEOs with the same name, it's acceptable to return any of them.
Testing

It's always a good idea to manually test your implementation at an interactive interpreter. However, starting with Task 2, we provide additional tools for you to check your code.

You can use the pre-written unit tests to check that each of your functions and methods are working as required:

In [None]:
$ python3 -m unittest --verbose tests.test_extract tests.test_database

There are a total of 21 unit tests for this task. When Task 2 is complete, all of the unit tests in these two modules will pass.

Furthermore, after completing Task 2 entirely, the inspect subcommand will fully work. Therefore, you can use the command line to test your code as well:

In [None]:
$ python3 main.py inspect --name Halley
NEO 1P (Halley) has a diameter of 11.000 km and is not potentially hazardous.

# Inspect the NEO with a primary designation of 433 (that's Eros!)
$ python3 main.py inspect --pdes 433
NEO 433 (Eros) has a diameter of 16.840 km and is not potentially hazardous.

# Attempt to inspect an NEO that doesn't exist.
$ python3 main.py inspect --verbose --name Ganymed
NEO 1036 (Ganymed) has a diameter of 37.675 km and is not potentially hazardous.
- On 1911-10-15 19:16, '1036 (Ganymed)' approaches Earth at a distance of 0.38 au and a velocity of 17.09 km/s.
- On 1924-10-17 00:51, '1036 (Ganymed)' approaches Earth at a distance of 0.50 au and a velocity of 19.36 km/s.
- On 1998-10-14 05:12, '1036 (Ganymed)' approaches Earth at a distance of 0.46 au and a velocity of 13.64 km/s.
- On 2011-10-13 00:04, '1036 (Ganymed)' approaches Earth at a distance of 0.36 au and a velocity of 14.30 km/s.
- On 2024-10-13 01:56, '1036 (Ganymed)' approaches Earth at a distance of 0.37 au and a velocity of 16.33 km/s.
- On 2037-10-15 18:31, '1036 (Ganymed)' approaches Earth at a distance of 0.47 au and a velocity of 18.68 km/s.

Don't forget that you can use the interactive subcommand to repeatedly inspect NEOs without having to reload the database each time!

Rubric Tip: Check if the dictionary mapping NEO names to NearEarthObjects doesn’t accidentally have a key for the empty string or None – those aren’t NEO names and shouldn’t be included.


### Task 3: Query close approaches with user-specified criteria.

Woohoo! You're making real progress. We can extract data from structured files, create NearEarthObject and CloseApproach instances to represent that data, and capture the data in an NEODatabase. Now, we'll provide the ability to query the data set of close approaches for a limited size stream of matching results.

We'll split this task up into a few steps:

    Create a collection of Filters from the options given by the user at the command line.
    Query the database's collection of close approaches to generate a stream of matching close approaches.
    Limit the stream of results to at most some given maximum number.

There are several filters that we'll implementing, corresponding to options from the query subcommand:

    Date (--date, --start-date, --end-date)
    Distance (--min-distance, --max-distance)
    Velocity (--min-velocity, --max-velocity)
    Diameter (--min-diameter, --max-diameter)
    Hazardous (--hazardous, --not-hazardous)

Of these, the date, distance, and velocity filters apply to attributes of an instance of CloseApproach, whereas the diameter and hazardous filters apply to attributes of an instance of NearEarthObject. The date filter operates on Python date and datetime objects; the distance, velocity, and diameter filters operate on floats, and the hazardous filter operates on booleans.

You have a lot of design freedom in the first and second steps. They are closely related, so it's a good idea to start with just one filter type (distance, perhaps) in step 1, so that you can build and test step 2. Once step 1 and step 2 are working with a single filter type, you can expand to implement each of the rest of the filters. You can also leverage the tests (in tests.test_query, with python3 -m unittest --verbose tests.test_query) to measure your steady progress through the first two steps.
Task 3a: Creating filters.

For this step, you'll implement the create_filters function in the filters.py file. The main.py script calls this function with the options that the user provided at the command line.

In [None]:
def create_filters(date=None, start_date=None, end_date=None,
                   distance_min=None, distance_max=None,
                   velocity_min=None, velocity_max=None,
                   diameter_min=None, diameter_max=None,
                   hazardous=None):

If the user didn't provide an option, its value will be None. Note that, if the user specifies --not-hazardous, the value of the hazardous argument will be False, not to be confused with None.

You have tons of flexibility in what this object returns. The main.py script takes whatever it receives and passes it directly to the query method that you'll implement in Task 3b.

Designing a program with this much flexibility can be daunting, so we've prepared a first step for one possible approach (from which you can, and likely will, deviate) - under this plan, the create_filters function will produce a collection of instances of subclasses of AttributeFilter - a helper class we've already provided to you. You don't need to rely on AttributeFilter or even use it at all - you can delete it and pursue your own implementation design - but here's the idea:

What do these filters have in common? Each of them compares (with <=,==, or >=) some attribute (of a CloseApproach or a NearEarthObject) to a reference value. For example, the date filters check if the close approach date is equal to, less than or equal to, or greater than or equal to the date given on the command line. So, the three things that seem to be shared between all of our filters are (1) a way to get the attribute we're interested in and (2) a way to compare that attribute against (3) some reference value. Where there's shared behavior, there's an opportunity for decomposition.

In [None]:
class AttributeFilter:
    def __init__(self, op, value):
        self.op = op
        self.value = value

    def __call__(self, approach):
        return self.op(self.get(approach), self.value)

    @classmethod
    def get(cls, approach):
        raise UnsupportedCriterionError

The three elements are present in the AttributeFilter superclass - in (1) the class method AttributeFilter.get, (2) the op argument to the constructor, and (3) the value argument to the constructor.

This abstract superclass's get method raises UnsupportedCriterionError, a custom subclass of NotImplementedError, but concrete subclasses will be able to override this method to actually get a specific attribute of interest. The op argument will represent the operation corresponding to either <=, ==, or >= - Python's operator module makes these available to us as operator.le, operator.eq, and operator.ge. That is, operator.ge(a, b) is the same as a >= b. Lastly, the value will just be our target value, as supplied by the user at the command line and fed to create_filters by the main module.

The \_\_call__ method makes instance objects of this type behave as callables - if we have an instance of a subclass of AttributeFilter named f, then the code f(approach) is really evaluating f.\_\_call__(approach). Specifically, "calling" the AttributeFilter with a CloseApproach object will get the attribute of interest (self.get(approach)) and compare it (via self.op) to the reference value (self.value), returning either True or False, representing whether that close approach satisfies the criterion.

As an example, suppose that we wanted to build an AttributeFilter that filtered on the designation attribute of the NearEarthObject attached to a CloseApproach (really, we wouldn't ever need this, because primary designations are unique and we already have NEODatabase.get_neo_by_designation). We could define a new subclass of AttributeFilter:


In [None]:
class DesignationFilter(AttributeFilter):
    @classmethod
    def get(cls, approach):
        return approach.neo.designation

# We could then create and use an instance of this new class:

approach_433 = CloseApproach(...)
approach_other = CloseApproach(...)
f = DesignationFilter(operator.eq, '433')
f(approach_433)  # => True
f(approach_other)  # => True

This might seem complex - and it is. Are there different ways to do this? Well, yes. However, this is a relatively clean first approach, and the AttributeFilter is a first step towards unifying these filters, from which you can deviate freely.
On Comparing Dates

So far, we've been treating dates (naive Python objects that store a year, month, and day) and datetimes (naive Python objects that store a year, month, day, hour, minute, and seconds) as essentially interchangeable. Mostly, we haven't cared too much about the details. However, dates and datetimes are not comparable (would "May 1st" be before, after, or equal to "May 1st at noon"?).

The date, start_date, and end_date arguments supplied to create_filters are dates, but the .time attribute of a CloseApproach is a datetime. You can use the .date() method on datetime objects to get the corresponding moment as a date. That is, you aren't able to evaluate start_date <= approach.time <= end_date but you are able to evaluate start_date <= approach.time.date() <= end_date
Task 3b: Query the database of close approaches using user-specified criteria.

Let's turn our attention back to the database.py file. For this task, you'll implement the query method, which will generate a stream of CloseApproaches that match the user's criteria.

The query method accepts one argument - a collection of filters. The main.py script supplies to the query method whatever was returned from the create_filters function you implemented above.

You have a lot of freedom in how you implement this method - your implementation choice depends heavily on how you designed your filters in the previous section. In pseudo-code, we roughly expect the implementation to look something like the following:

In [None]:
define query(filters):
  for each approach in the database's collection of close approaches:
    if this close approach passes each of the criteria:
      yield this close approach

As before, you can certainly deviate from this pattern, especially depending on how you chose to implement the previous step.

Why yield? Recall that when we use yield in a Python function, it becomes a generator function, capable of pausing and resuming. Generators are often useful to represent sources of data streams. In our project, there might be thousands of close approaches matching the user's criteria, but we might only need to show the first ten (specified with the --limit command-line option). For these cases, we'll want the query function not to return a fully-computed collection of matching close approaches - which could take a while to compute - but rather to generate a stream of matching close approaches. In doing so, we'll make the query method almost instantaneous, and only do the work to determine the next element of the generator (the next matching CloseApproach) if another unit of code asks for it.

There are a plethora of other ways to optimize this method as well. For example, you could preprocess even more auxiliary data structures in the NEODatabase constructor to speed up specific queries. You might map dates to collections of close approaches that occurred on those dates, to speed up the --date criterion. You might order the close approaches by distance or velocity, or the NEOs by diameter, in order to more efficiently search for matches. Furthermore, you might be able to intelligently combine filters - for example, there are definitely no close approaches that are simulataneously closer than 0.1au (--max-distance 0.1) to Earth and further than 0.3au (--max-distance 0.3) from Earth. Depending on the exact approach you take, some of these changes may affect the design of your filters or the create_filter function, but there are many opportunities for performance improvements.

However, while these additional optimizations are certainly interesting - and in many cases can speed up the time it takes to perform complex queries - they are in no way necessary to successfully complete this task. By following the pseudocode given above, you can query the collection of close approaches to generate (with yield) a stream of results that match user-specified criteria.
Task 3c: Limit the results to at most some maximum number.

After the main.py script runs .query on the NEODatabase with the objects you produced in create_filters, it sends the stream of results through the limit function in the filters module. This is the next function that we'll write.


In [None]:
def limit(iterator, n):
    ...

The first argument - iterator - represents a stream of data, as an iterable. In our pipeline, it will be the stream of CloseApproaches produced by the query method. The second argument - n- represents the maximum number of elements from the stream that might be produced by the limit function. If n is None or zero, you shouldn't limit the results at all.

You should not treat the iterator argument as being an in-memory aggregate data type, such as a list or a tuple. In particular, you should not slice the iterator argument.

Why restrict ourselves in this way? With any sufficiently large dataset, we'd usually like to do the minimum number of operations necessary to achieve our goal. As just discussed, there are some queries for which, if we simply calculated and buffered all matching close approaches from the query method and sliced the result, the runtime would be just too slow. Although our data set may be small enough for the naive solution to be possible, it's still big enough to illustrate a noticeable improved performance by leveraging operations on iterators and generators.

As a hint, (although not necessary) you may find the itertools.islice(opens in a new tab) function helpful.
Testing

It's getting a little harder to manually test your implementations.

At the command line, as you implement more and more individual filters (and their effect on query), you'll unlock more and more of the options of the query subcommand. When this task is finished, the query subcommand will work completely, with the exception of --outfile. Here are a few examples:

In [None]:
# Query for close approaches on 2020-01-01
$ python3 main.py query --date 2020-01-01

# Query for close approaches in 2020.
$ python3 main.py query --start-date 2020-01-01 --end-date 2020-12-31

# Query for close approaches in 2020 with a distance of <=0.1 au.
$ python3 main.py query --start-date 2020-01-01 --end-date 2020-12-31 --max-distance 0.1

# Query for close approaches in 2020 with a distance of >=0.3 au.
$ python3 main.py query --start-date 2020-01-01 --end-date 2020-12-31 --min-distance 0.3

# Query for close approaches in 2020 with a velocity of <=50 km/s.
$ python3 main.py query --start-date 2020-01-01 --end-date 2020-12-31 --max-velocity 50

# Query for close approaches in 2020 with a velocity of >=25 km/s.
$ python3 main.py query --start-date 2020-01-01 --end-date 2020-12-31 --min-velocity 25

# Query for close approaches of not potentially-hazardous NEOs between 500m and 600m in diameter.
$ python3 main.py query --min-diameter 0.5 --max-diameter 0.6 --not-hazardous

# Query for close approaches of potentially-hazardous NEOs larger than 2.5km passing within 0.1 au at a speed of at least 35 km/s
# Hint: There's only one match in the whole dataset :)
$ python3 main.py query --max-distance 0.1 --min-velocity 35 --min-diameter 2.5 --hazardous

There are more examples at the start of this README and in the main.py file's module comment.

In some cases, you might want to inspect an NEO to check that the diameter and hazardous filters behave correctly.

Again, recall that you can use the interactive subcommand to load the database once and perform several query and inspect commands, which will avoid excessively waiting for your code to reload the database with each command.

Additionally, you can use the pre-written unit tests to exercise each of these steps. You can read the test files if you'd like to see exactly which test cases we use.

$ python3 -m unittest tests.test_query tests.test_limit

There are a total of 37 unit tests for this task. You can use these tests during development as well. As you implement individual filter types, you'll pass more and more of the tests.

When this task is complete, all tests should pass.

Rubric Tip:

    In the create_filters method, differentiate between hazardous being False (from --not-hazardous) and None (from no option). Furthermore, a common bug is around comparing dates and datetimes - make sure you are comparing dates.
    If you follow the AttributeFilter subclass hierarchy suggested in the instruction, you'll have to define the classmethod get in each of the subclasses.
    The NEODatabase’s query method should generate a stream of CloseApproaches that match the filters returned by create_filters.
    In the Task 3c, to limit the values produced by an iterator to, at most, a maximum number, the best approach is to just return itertools.islice(iterator, n). You can have other creative approaches here.

### Task 4: Report the results.

Fantastic! You've successfully written code to filter and limit the database of close approaches with user-specified criteria. So far, the results have been simply printed to standard output.

For this task, you'll implement functions in write.py to save these results to an output file. You'll write two functions:


In [None]:
    write_to_csv: Write a stream of CloseApproach objects to a specific CSV file.
    write_to_json: Write a stream of CloseApproach objects to a specific JSON file.

Each of these functions accepts two arguments: results and filename.

The results parameter is a stream of CloseApproach objects, as produced by the limit function. The filename parameter is a Path-like object with the name of the output file. You can open(filename, 'w') as usual.

If there are no results, then write_to_csv should just write a header row, and write_to_json should just write an empty list.
CSV Output Format

The write_to_csv method should write a stream of results to a CSV file and include a header row. Each row will represent one CloseApproach from the stream of results, and include information about the close approach as well as the associated NEO. The header columns should be: 'datetime_utc', 'distance_au', 'velocity_km_s', 'designation', 'name', 'diameter_km', 'potentially_hazardous'.

As an example, consider the CloseApproach when the NEO Eros approaches Earth on 2025-11-30 02:18. For this close approach, the corresponding row would be:


In [None]:
datetime_utc,distance_au,velocity_km_s,designation,name,diameter_km,potentially_hazardous
...
2025-11-30 02:18,0.397647483265833,3.72885069167641,433,Eros,16.84,False
...

A missing name must be represented in the CSV output by the empty string (not the string 'None'). A missing diameter must be represented in the CSV output either by the empty string or by the string 'nan'. The potentially_hazardous flag must be represented in the CSV output either by the string 'False' or the string 'True' (not the values 0 or 1, nor the strings 'N' or 'Y'). The csv module will be able to handle some of these requirements for you.
JSON Output Format

The write_to_json method should write a stream of results to a JSON file. The top-level JSON object must be a list, with each entry representing one CloseApproach from the stream of results. Each entry should be a dictionary mapping the keys 'datetime_utc', 'distance_au', 'velocity_km_s' to the associated values on the CloseApproach object and the key neo to a dictionary mapping the keys 'designation', 'name', 'diameter_km', 'potentially_hazardous' to the associated values on the close approach's NEO.

As an example, consider the (same) CloseApproach when the NEO Eros approaches Earth on 2025-11-30 02:18. For this close approach, the corresponding entry would be:


In [None]:
[
  {...},
  {
    "datetime_utc": "2025-11-30 02:18",
    "distance_au": 0.397647483265833,
    "velocity_km_s": 3.72885069167641,
    "neo": {
      "designation": "433",
      "name": "Eros",
      "diameter_km": 16.84,
      "potentially_hazardous": false
    }
  },
  ...
]

The datetime_utc value should be a string formatted with datetime_to_str from the helpers module; the distance_au and velocity_km_s values should be floats; the designation and name should be strings (if the name is missing, it must be the empty string); the diameter_km should be a float (if the diameter_km is missing, it should be the JSON value NaN, which Python's json loader successfully rehydrates as float('nan')); and potentially_hazardous should be a boolean (i.e. the JSON literals false or true, not the strings 'False' nor 'True').
Deduplicating Serialization

It can feel as though this output specification includes several edge cases. Fortunately, with the right design, Python's default behavior will handle these edge cases smoothly. While you are free to concretely implement these methods in any way you would like, we recommend that you add .serialize()methods to the NearEarthObject and CloseApproach classes that each produce a dictionary containing relevant attributes for CSV or JSON serialization. These methods can individually handle any edge cases, in a single place. For example:


In [None]:
>>> neo = NearEarthObject(...)
>>> approach = CloseApproach(...)
>>> print(neo.serialize())
{'designation': '433', 'name': 'Eros', 'diameter_km': 16.84, 'potentially_hazardous': False}
>>> print(approach.serialize())
{'datetime_utc': '2025-11-30 02:18', 'distance_au': 0.397647483265833, 'velocity_km_s': 3.72885069167641}

### Testing

Congratulations! This was the final task for this project.

At this point, all of the unit tests should pass. You can run all of the unit tests:


In [None]:
$ python3 -m unittest
.........................................................................
----------------------------------------------------------------------
Ran 73 tests in 3.666s

OK

Heck, run it with python3 -m unittest --verbose to verbosely celebrate all of the test cases that you have now made pass.

Tests for this specific task are in the tests.test_write module.

Furthermore, the complete functional interface of the command line tool should now work. Therefore, you can now use main.py freely (including the --outfile argument). For example:


In [None]:
# Save (the first) five close approaches on 2020-01-01 to a CSV file.
$ python3 main.py query --date 2020-01-01 --limit 5 --outfile results.csv

# Save (the first) five close approaches on 2020-01-01 to a JSON file.
$ python3 main.py query --date 2020-01-01 --limit 5 --outfile results.json

# Putting it all together.
# Save (the first) ten close approaches between 2020-01-01 and 2020-12-31 of a potentially-hazardous NEO larger than 250m in diameter that passed within 0.1au of Earth to a JSON file.
$ python3 main.py query --start-date 2020-01-01 --end-date 2020-12-31 --hazardous --min-diameter 0.25 --max-distance 0.1 --limit 5 --outfile results.json

Rubric Tip:

    You should use either the csv module or the json module.
    The submitted code should pass all test cases and runs without error.



### Recap

We've reviewed a lot of information. Here's a high-level overview of the main parts of each task.

    Task 0: Inspect data. (data/neos.csv and data/cad.json)
    Task 1: Build models. (models.py)
        Write __init__ and __str__ methods for NearEarthObject and CloseApproach
    Task 2a: Extract data. (extract.py)
        Implement load_neos and load_approaches to read data from CSV and JSON files.
    Task 2b: Process data. (database.py)
        Implement the constructor for NEODatabase, preprocessing the data to help with future queries.
        Write methods to get NEOs by primary designation or by name.
    Task 3a: Create filters. (filters.py)
        Define a hierarchy of Filters.
        Implement create_filters to create a collection of filters from user-specified criteria.
    Task 3b: Query matching close approaches (database.py)
        Implement the query method to generate a stream of CloseApproaches that match the given filters.
    Task 3c: Limit results. (filter.py)
        Write limit to produce only the first values from a generator.
    Task 4: Save data. (write.py)
        Implement write_to_csv and write_to_json to save structured data to a formatted file.
