Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Worker node memory errors #861

Closed
hdoupe opened this issue Mar 23, 2018 · 8 comments
Closed

Worker node memory errors #861

hdoupe opened this issue Mar 23, 2018 · 8 comments

Comments

@hdoupe
Copy link
Collaborator

hdoupe commented Mar 23, 2018

Tax-Calculator 0.17.0 is using too much memory on our AWS machines for reforms using the CPS file. Our current servers are memory optimized Ubuntu 16.04 r3-large AWS machines which means that they have 2 CPU's and 16 GB of RAM. The stack-trace from the memory error looks like:

[2018-03-22 18:01:52,195: ERROR/MainProcess] Task taxbrain_server.celery_tasks.dropq_task_async[cabf2b60-a605-4b9d-b2f5-d1324d0f31a1] raised unexpected: MemoryError()
Traceback (most recent call last):
  File "/home/ubuntu/miniconda2/envs/aei_dropq/lib/python2.7/site-packages/celery/app/trace.py", line 374, in trace_task
    R = retval = fun(*args, **kwargs)
  File "/home/ubuntu/miniconda2/envs/aei_dropq/lib/python2.7/site-packages/celery/app/trace.py", line 629, in __protected_call__
    return self.run(*args, **kwargs)
  File "/home/ubuntu/deploy/taxbrain_server/celery_tasks.py", line 86, in dropq_task_async
    use_puf_not_cps=use_puf_not_cps, use_full_sample=True)
  File "/home/ubuntu/deploy/taxbrain_server/celery_tasks.py", line 70, in dropq_task
    user_mods=user_mods
  File "/home/ubuntu/miniconda2/envs/aei_dropq/lib/python2.7/site-packages/taxcalc/tbi/tbi.py", line 107, in run_nth_year_tax_calc_model
    summ = summary(rawres1, rawres2, mask)
  File "/home/ubuntu/miniconda2/envs/aei_dropq/lib/python2.7/site-packages/taxcalc/tbi/tbi_utils.py", line 440, in summary
    tax_to_diff='payrolltax')
  File "/home/ubuntu/miniconda2/envs/aei_dropq/lib/python2.7/site-packages/taxcalc/utils.py", line 584, in create_difference_table
    diffs = diff_table_stats(res2, groupby, baseline_income_measure)
  File "/home/ubuntu/miniconda2/envs/aei_dropq/lib/python2.7/site-packages/taxcalc/utils.py", line 520, in diff_table_stats
    diffs_without_sums = stat_dataframe(gpdf)
  File "/home/ubuntu/miniconda2/envs/aei_dropq/lib/python2.7/site-packages/taxcalc/utils.py", line 501, in stat_dataframe
    'benefit_value_total')
  File "/home/ubuntu/miniconda2/envs/aei_dropq/lib/python2.7/site-packages/pandas/core/groupby.py", line 805, in apply
    return self._python_apply_general(f)
  File "/home/ubuntu/miniconda2/envs/aei_dropq/lib/python2.7/site-packages/pandas/core/groupby.py", line 809, in _python_apply_general
    self.axis)
  File "/home/ubuntu/miniconda2/envs/aei_dropq/lib/python2.7/site-packages/pandas/core/groupby.py", line 1964, in apply
    for key, (i, group) in zip(group_keys, splitter):
  File "/home/ubuntu/miniconda2/envs/aei_dropq/lib/python2.7/site-packages/pandas/core/groupby.py", line 4660, in __iter__
    sdata = self._get_sorted_data()
  File "/home/ubuntu/miniconda2/envs/aei_dropq/lib/python2.7/site-packages/pandas/core/groupby.py", line 4679, in _get_sorted_data
    return self.data._take(self.sort_idx, axis=self.axis, convert=False)
  File "/home/ubuntu/miniconda2/envs/aei_dropq/lib/python2.7/site-packages/pandas/core/generic.py", line 2150, in _take
    verify=True)
  File "/home/ubuntu/miniconda2/envs/aei_dropq/lib/python2.7/site-packages/pandas/core/internals.py", line 4264, in take
    axis=axis, allow_dups=True)
  File "/home/ubuntu/miniconda2/envs/aei_dropq/lib/python2.7/site-packages/pandas/core/internals.py", line 4150, in reindex_indexer
    for blk in self.blocks]
  File "/home/ubuntu/minicond

I wanted to make sure that there wasn't an issue with a particular server. So, I detached one of the production nodes, attached it to the test server and updated it to use Tax-Calculator 0.17.0 and PB v1.5.0_rc6. However, the same issue occurred. Next, a new memory optimized version, r4, has replaced the current version, r3. Instead of going through the trouble of re-building the current worker-node environment, I did some further testing and minor tweaks of the containerized worker node environment from PR #832 and deployed that on the r4-large server. Again, I ran into memory constraints. Next, I spun up a the next largest machine, r4-xlarge, which has 4 CPU and 30 GB RAM. This did the trick. The test-server is currently running on this server.

Note that we do not need 30 GB of RAM. However, this is the next size up. The highest observed memory usage on this server is about 15 GB. Thus, we were just over the limit for the r4-large machines. The memory usage sits at around 4-8 GB RAM until the tables are built. At that point, the memory sits at around 10-15+ GB.

I plan to spin up 7-10 of the r4-xlarge machines for the production app. I will then manually install docker on each machine and spin up the containers as built from the Docker files in PR #832. The computation time for each year has also increased to about 90 seconds. I will update the job processing time variable in a follow up PR. This is similar to issue to #750.

@MattHJensen
Copy link
Contributor

MattHJensen commented Mar 23, 2018

It would be good to have an open issue for investigating approaches to return to the smaller nodes. I imagine there is a significant compute cost of the upgrade that we might not want to bear forever.

@hdoupe
Copy link
Collaborator Author

hdoupe commented Mar 23, 2018

@MattHJensen Yes, I agree. Once this version is deployed, we can do some profiling and figure out what the issue is. My hunch is that it has to do with building the tables.

For now, we could keep the number of nodes down to 5 and see how much that would cost.

@martinholmer
Copy link
Contributor

@MattHJensen said in PolicyBrain issue #861:

It would be good to have an open issue for investigating approaches to return to the smaller nodes. I imagine there is a significant compute cost of the upgrade that we might not want to bear forever.

Once the age tables are implemented, and we go from 10 dist/diff tables to 30 dist/diff tables, TaxBrain will need the r4-xlarge AWS instances, if not the next bigger AWS instance. There's no going back given the current development trajectory. The 10 existing tables are bigger in taxcalc 0.17.0 for two reasons: (1) the extra rows (for negative/zero income) at the bottom of the table and (2) the extra columns for three new variables (UBI, consumption value of benefits, and government cost of benefits).

@hdoupe
Copy link
Collaborator Author

hdoupe commented Mar 23, 2018

This could be the case, but I pulled a result I ran locally to get an idea of how big the final 10 year file is.

Below I show the process of pulling the results (for clarity, I removed a couple commands that threw errors):

In [1]: from webapp.apps.taxbrain.models import TaxSaveInputs, OutputUrl

In [3]: url = OutputUrl.objects.filter(pk=19) # knew that this was a full run with the CPS

In [4]: url
Out[4]: [<OutputUrl: OutputUrl object>]

In [6]: url[0].unique_inputs
Out[6]: <TaxSaveInputs: TaxSaveInputs object>

In [7]: url[0].unique_inputs.quick_calc # quick_calc is false --> full run
Out[7]: False

In [10]: url[0].taxcalc_vers # tc-version used
Out[10]: u'0.17.0'

In [11]: import json

In [12]: with open('result_19.json', 'w') as f: f.write(json.dumps(url[0].unique_inputs.tax_result)) # dump result

In [13]: res_19 = url[0].unique_inputs.tax_result # get the result

In [14]: res_19.keys() # table names
Out[14]: 
[u'aggr_1',
 u'aggr_2',
 u'diff_ptax_xdec',
 u'diff_comb_xbin',
 u'dist2_xdec',
 u'aggr_d',
 u'dist1_xdec',
 u'diff_comb_xdec',
 u'diff_ptax_xbin',
 u'dist2_xbin',
 u'diff_itax_xdec',
 u'diff_itax_xbin',
 u'dist1_xbin']

In [15]: res_19['diff_ptax_xbin'].keys() # row names show that this was a full run
Out[15]: 
[u'$200-500K_1',
 u'$200-500K_0',
 u'$200-500K_3',
 u'$50-75K_7',
 u'$200-500K_5',
 u'$200-500K_4',
 u'$200-500K_7',
 u'$200-500K_6',
 u'$200-500K_9',
 u'$200-500K_8',
 u'$75-100K_5',
 u'$200-500K_2',
 u'$10-20K_1',
 u'$75-100K_8',
 u'$50-75K_6',
 u'$75-100K_9',
 u'$0-10K_1',
 u'$10-20K_0',
 u'$50-75K_5',
 u'$75-100K_2',
 u'$10-20K_5',
 u'$75-100K_0',
 u'$75-100K_1',
 u'$0-10K_8',
 u'$0-10K_9',
 u'$75-100K_4',
 u'$10-20K_4',
 u'$0-10K_4',
 u'$0-10K_5',
 u'$0-10K_6',
 u'$0-10K_7',
 u'$0-10K_0',
 u'=$0K_1',
 u'$0-10K_2',
 u'$0-10K_3',
 u'=$0K_0',
 u'$30-40K_9',
 u'$30-40K_8',
 u'$30-40K_5',
 u'=$0K_7',
 u'$30-40K_7',
 u'$30-40K_6',
 u'$30-40K_1',
 u'$30-40K_0',
 u'$30-40K_3',
 u'$30-40K_2',
 u'=$0K_5',
 u'$30-40K_4',
 u'=$0K_4',
 u'$100-200K_5',
 u'$100-200K_4',
 u'$100-200K_7',
 u'$100-200K_6',
 u'$100-200K_1',
 u'$100-200K_0',
 u'$100-200K_3',
 u'$100-200K_2',
 u'$100-200K_9',
 u'$100-200K_8',
 u'<$0K_8',
 u'<$0K_9',
 u'<$0K_4',
 u'<$0K_5',
 u'<$0K_6',
 u'<$0K_7',
 u'<$0K_0',
 u'<$0K_1',
 u'<$0K_2',
 u'<$0K_3',
 u'$10-20K_3',
 u'$40-50K_9',
 u'$40-50K_8',
 u'$40-50K_5',
 u'$40-50K_4',
 u'$40-50K_7',
 u'$40-50K_6',
 u'$40-50K_1',
 u'$40-50K_0',
 u'$40-50K_3',
 u'$40-50K_2',
 u'=$0K_6',
 u'$20-30K_9',
 u'$20-30K_8',
 u'$20-30K_5',
 u'$20-30K_4',
 u'$20-30K_7',
 u'$20-30K_6',
 u'$20-30K_1',
 u'$20-30K_0',
 u'$20-30K_3',
 u'$20-30K_2',
 u'>$1000K_4',
 u'>$1000K_5',
 u'>$1000K_6',
 u'>$1000K_7',
 u'>$1000K_0',
 u'>$1000K_1',
 u'>$1000K_2',
 u'>$1000K_3',
 u'>$1000K_8',
 u'>$1000K_9',
 u'$10-20K_2',
 u'ALL_1',
 u'ALL_0',
 u'ALL_3',
 u'ALL_2',
 u'ALL_5',
 u'ALL_4',
 u'ALL_7',
 u'ALL_6',
 u'ALL_9',
 u'ALL_8',
 u'=$0K_9',
 u'=$0K_8',
 u'$10-20K_9',
 u'$10-20K_8',
 u'$50-75K_9',
 u'$50-75K_8',
 u'=$0K_3',
 u'=$0K_2',
 u'$10-20K_7',
 u'$50-75K_4',
 u'$50-75K_3',
 u'$50-75K_2',
 u'$50-75K_1',
 u'$50-75K_0',
 u'$500-1000K_1',
 u'$500-1000K_0',
 u'$500-1000K_3',
 u'$500-1000K_2',
 u'$500-1000K_5',
 u'$500-1000K_4',
 u'$500-1000K_7',
 u'$500-1000K_6',
 u'$500-1000K_9',
 u'$500-1000K_8',
 u'$75-100K_3',
 u'$75-100K_6',
 u'$10-20K_6',
 u'$75-100K_7']

In [17]: url[0].unique_inputs.data_source # using the CPS file
Out[17]: u'CPS'

In [18]: 

and size of the JSON file containing the results:

HDoupe-MacBook-Pro:PolicyBrain henrydoupe$ ls -lh result_19.json 
-rw-r--r--  1 henrydoupe  staff   390K Mar 23 13:50 result_19.json

The final size of the results is pretty small at 390 K. I think there is another issue with how the CPS file is treated when the tables are created. There are a lot of ways to do different things in pandas. This benchmarking repo shows that some ways can be faster or use less memory than others. I think that before we resign to using a more powerful computer we should take a look at whether we are using the pandas functions in an optimal way and whether there is a less memory intensive approach to building these tables.

@martinholmer
Copy link
Contributor

@hdoupe said:

The final size of the results is pretty small at 390 K. I think there is another issue with how the CPS file is treated when the tables are created. There are a lot of ways to do different things in pandas. This benchmarking repo shows that some ways can be faster or use less memory than others. I think that before we resign to using a more powerful computer we should take a look at whether we are using the pandas functions in an optimal way and whether there is a less memory intensive approach to building these tables.

Of course, the JSON objects that hold the table results are tiny. It is all the work involved in constructing the JSON object that takes memory. For example, sorting a dataframe containing over 400,000 rows takes a lot of memory.

Before you start changing the Tax-Calculator tbi logic to save some money, have you considered spot instances? They cost a fraction of what the corresponding on-demand instances cost. I'm assuming that PolicyBrain is running on on-demand AWS instances. If the work load is low (as I imagine it is all night and much of the day), you could save an enormous amount by switching to all (or mostly) spot instances. Read the AWS documentation on spot instances for the trade-offs. My years of experience were very positive using AWS spot instances.

@martinholmer
Copy link
Contributor

@hdoupe, In your discussion of the AWS instances, you don't mention how many Tax-Calculator processes are running on each AWS instance. Is it just one process per instance? Or are you running two or more to use the multiple CPUs in each instance?

@hdoupe
Copy link
Collaborator Author

hdoupe commented Mar 26, 2018

@martinholmer said:

Of course, the JSON objects that hold the table results are tiny. It is all the work involved in constructing the JSON object that takes memory. For example, sorting a dataframe containing over 400,000 rows takes a lot of memory.

Ah, right. Do you think it's possible that these dataframes are being duplicated somehow? To me, it just seems excessive that a data set that I would think is just a few hundred megabytes even after all of the new tax variables are added is blowing up to 15 GB. However, there could be no way around that.

Before you start changing the Tax-Calculator tbi logic to save some money, have you considered spot instances? They cost a fraction of what the corresponding on-demand instances cost. I'm assuming that PolicyBrain is running on on-demand AWS instances. If the work load is low (as I imagine it is all night and much of the day), you could save an enormous amount by switching to all (or mostly) spot instances. Read the AWS documentation on spot instances for the trade-offs. My years of experience were very positive using AWS spot instances.

Thanks for the advice. I don't have very much knowledge about spot instances, but my impression of them is that they are useful for non-urgent computing. Thus, they are great for scientific computing because you can run these models overnight and during non-peak hours. On the otherhand, PolicyBrain wants to kick off the simulation immediately and return the results relatively quickly. That's why I wasn't sure if they would be a good fit. I could be totally wrong though. Please correct me if I am. I'll do some further research into this, and we can talk about whether they are a good fit for our needs.

In your discussion of the AWS instances, you don't mention how many Tax-Calculator processes are running on each AWS instance. Is it just one process per instance? Or are you running two or more to use the multiple CPUs in each instance?

We are running one process per machine. Granted, this was all set up before I started, and I was not given any information as to why we chose these types of machines. I just sort of deduced that we chose instances with a high memory to CPU ratio because we were running independent processes with relatively large memory requirements. I'm completely open to re-thinking what type of machine we want to run these processes on. It could be that a setup with less instances but with more memory and CPU's is more cost-effective then more instances with less CPU's and memory.

@martinholmer Thank you for the thoughtful questions and advice. This is an area where I have very little (none prior to starting at OSPC) experience, but I'm trying to pick things up as quickly as I can.

@hdoupe
Copy link
Collaborator Author

hdoupe commented Mar 30, 2018

Resolved via PSLmodels/Tax-Calculator#1942. The test app is currently running on r4-large instances with no problems. We are still getting a memory usage spike around 12.5 GB. That gives us about a 2 GB margin of error. This is something we should keep in mind.

One way to test this in the future is running Tax-Calculator in Docker containers. You can toggle the available memory for a container up to your machine's memory limits. For example, you could simulate only having 5 GB of RAM, and your process will be killed if it goes over just like if you only had 5 GB of RAM on your machine.

However, my knowledge of this area is pretty limited. So, if anyone else has ideas on how to do this, then I am interested in discussing them.

Thanks for quickly fixing this @martinholmer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants