New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

World Bank datareader cannot retrive monthly data #198

Open
kurtforrester opened this Issue Apr 27, 2016 · 1 comment

Comments

Projects
None yet
2 participants
@kurtforrester

kurtforrester commented Apr 27, 2016

I am trying to extract data from the World Bank and have been successful with the following code via pandas.io:

from pandas.io import wb
df_wb = wb.download(indicator=['GOLD','SILVER','COPPER','ZINC','LEAD','NICKEL', 
                             'ALUMINUM'],country='ALL',start='1980M01',end='2016M12')

This works and returns the data as desired even though the start and end expect int inputs. This is likely to be as a result of no checking on the start and end inputs.

The same code above employing pandas-datareader fails. Given there is no facility in pandas-datareader to specify the frequency of the data requested from the World Bank the data returned when supplying a datetime object or int is limited to an annual frequency.

To facilitate the returning of monthly data the call signature for _BaseReader (in base.py) would need to be updated to include a parameter to specify the frequency, for instance ...,freq='M',... consistent with pandas naming, and for this to be taken into account when building the url for the World Bank database (in wb.py).

@property
def params(self):
    return {'date': '{0}:{1}'.format(self.start.year, self.end.year),
        'per_page': 25000, 'format': 'json'}

becomes

@property
def params(self):
   if self.freq = 'M'
        return {'date': '{0}M{1:02d}:{2}M{3:02d}'.format(self.start.year, self.start.month, 
            self.end.year self.end.month),'per_page': 25000, 'format': 'json'}
    return {'date': '{0}:{1}'.format(self.start.year, self.end.year),
        'per_page': 25000, 'format': 'json'}
@kurtforrester

This comment has been minimized.

kurtforrester commented Apr 27, 2016

I have looked at a solution but do not know how to get changes back into git hub. Instruction welcome. So there is a record of the change I have provided it below. It addresses the issue of retrieving monthly data. An additional issue is the index is no longer terribly useful but his can be overcome with existing tools in pandas.

-------------------------- pandas_datareader/base.py --------------------------
index 46d3179..26a17c5 100644
@@ -42,13 +42,14 @@ class _BaseReader(object):
     _chunk_size = 1024 * 1024
     _format = 'string'

-    def __init__(self, symbols, start=None, end=None,
+    def __init__(self, symbols, start=None, end=None, freq=None,
                  retry_count=3, pause=0.1, session=None):
         self.symbols = symbols

         start, end = self._sanitize_dates(start, end)
         self.start = start
         self.end = end
+        self.freq = freq

         if not isinstance(retry_count, int) or retry_count < 0:
             raise ValueError("'retry_count' must be integer larger than 0")

--------------------------- pandas_datareader/wb.py ---------------------------
index e59b3e0..750369e 100644
@@ -117,7 +117,7 @@ class WorldBankReader(_BaseReader):
     _format = 'json'

     def __init__(self, symbols=None, countries=None,
-                 start=None, end=None,
+                 start=None, end=None, freq=None,
                  retry_count=3, pause=0.001, session=None, errors='warn'):

         if symbols is None:
@@ -126,7 +126,7 @@ class WorldBankReader(_BaseReader):
             symbols = [symbols]

         super(WorldBankReader, self).__init__(symbols=symbols,
-                                              start=start, end=end,
+                                              start=start, end=end, freq=freq,
                                               retry_count=retry_count,
                                               pause=pause, session=session)

@@ -154,6 +154,10 @@ class WorldBankReader(_BaseReader):

     @property
     def params(self):
+        if self.freq == 'M':
+            return {'date': '{0}M{1:02d}:{2}M{3:02d}'.format(self.start.year,
+                    self.start.month, self.end.year, self.end.month),
+                    'per_page': 25000, 'format': 'json'}
         return {'date': '{0}:{1}'.format(self.start.year, self.end.year),
                 'per_page': 25000, 'format': 'json'}

@kurtforrester kurtforrester reopened this Apr 27, 2016

kurtforrester pushed a commit to kurtforrester/pandas-datareader that referenced this issue Apr 27, 2016

Kurt Forrester Kurt Forrester
pydata#198
added `freq` paramter `_BaseReader` and updated `def params` to accomodate monthly frequency data from the World Bank database.

@sinhrks sinhrks added the enhancement label Apr 28, 2016

bashtage added a commit to bashtage/pandas-datareader that referenced this issue Jan 18, 2018

pydata#198
added `freq` paramter `_BaseReader` and updated `def params` to accomodate monthly frequency data from the World Bank database.

bashtage added a commit to bashtage/pandas-datareader that referenced this issue Jan 18, 2018

pydata#198
added `freq` paramter `_BaseReader` and updated `def params` to accomodate monthly frequency data from the World Bank database.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment