larger raw file for testing adjust_rub() function #3

epogrebnyak · 2019-07-05T09:34:13Z

Сделать более крупный файл для теста с помощью csvki (например, 10+ 10+ 50 рядов)

epogrebnyak · 2019-08-19T00:52:37Z

Code to be tested and speed checked:

Lines 9 to 36 in 0f02c77

    
           # FIXME: very slow code, even on small data 
        
           #        maybe concating is faster? 
        
           # billions 
        
           #bf = df[df.unit == "385"] 
        
           #bf.loc[:,cols] = bf.loc[:, cols].multiply(1000) 
        
           #bf.loc[:, "unit"] = "384" 
        
           #index = bf.index.tolist() 
        
           # 
        
           # thousands 
        
           #tf = df[df.unit == "383"] 
        
           #tf.loc[:,cols] = tf.loc[:, cols].divide(1000).round(0).astype(int) 
        
           #tf.loc[:, "unit"] = "384" 
        
           # index.extend(rf.index.tolist()) 
        
           # 
        
           # concat 
        
           #remains = df[~df.index.isin(index)] 
        
           # concat remains, bf, rf 
        
           def adjust_rub(df, cols=NUMERIC_COLUMNS): 
        
               rows = (df.unit == "385") 
        
               df.loc[rows, cols] = df.loc[rows, cols].multiply(1000) 
        
               df.loc[rows, "unit"] = "384" 
        
               rows = (df.unit == "383") 
        
               df.loc[rows, cols] = df.loc[rows, cols].divide(1000).round(0).astype(int) 
        
               df.loc[rows, "unit"] = "384" 
        
               return df

epogrebnyak · 2019-08-21T12:21:33Z

Code for showing run times:

def canonic_df(df):
    """Преобразовать данные внтури датафрейма:

    - Привести все строки к одинаковым единицам измерения (тыс. руб.)
    - Убрать  неиспользуемые колонки (date_revised, report_type)
    - Новые колонки:
        * короткое название компании
        * три уровня кода ОКВЭД
        * регион (по ИНН)

    """
    df_ = add_okved_subcode(add_region(add_title(df)))
    df_ = rename_rows(df_)
    df_ = adjust_rub(df_)
    return df_[canonic_columns()].set_index('inn')

print("obtaining source...")
root_df0 = boo.main.read_intermediate_df(2017)

print("canonic_df(df)")
df = root_df0.copy()
%timeit canonic_df(df)  

print("columns")
df = root_df0.copy()
%timeit add_okved_subcode(add_region(add_title(df)))

print("adjust rub")
df = root_df0.copy()
%timeit adjust_rub(df)

print("renaming")
df = root_df0.copy()
%timeit rename_rows(df)

epogrebnyak · 2019-08-21T12:27:25Z

canonic_df(df)
1 loop, best of 3: 18.3 s per loop
columns
1 loop, best of 3: 14.5 s per loop
adjust rub
1 loop, best of 3: 2.08 s per loop
renaming
The slowest run took 4.85 times longer than the fastest. This could mean that an intermediate result is being cached.
1 loop, best of 3: 181 ms per loop

epogrebnyak · 2019-08-21T12:41:52Z

Remaining questions branched to #13.

epogrebnyak mentioned this issue Aug 6, 2019

replace TODO.txt #8

Closed

31 tasks

epogrebnyak changed the title ~~test suite~~ larger raw file for testing adjust_rub() function Aug 19, 2019

epogrebnyak added this to the 0.1 milestone Aug 19, 2019

epogrebnyak added the enhancement New feature or request label Aug 19, 2019

epogrebnyak added the testing label Aug 19, 2019

epogrebnyak added a commit that referenced this issue Aug 21, 2019

new func for #3

7de94ef

epogrebnyak added a commit that referenced this issue Aug 21, 2019

new func for #3

0dd7ef5

epogrebnyak added a commit that referenced this issue Aug 21, 2019

small sample file for #3

7dd7ab8

epogrebnyak added a commit that referenced this issue Aug 21, 2019

new sample files for #3

68a2388

epogrebnyak closed this as completed Aug 21, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

larger raw file for testing adjust_rub() function #3

larger raw file for testing adjust_rub() function #3

epogrebnyak commented Jul 5, 2019 •

edited

Loading

epogrebnyak commented Aug 19, 2019

epogrebnyak commented Aug 21, 2019 •

edited

Loading

epogrebnyak commented Aug 21, 2019

epogrebnyak commented Aug 21, 2019

larger raw file for testing adjust_rub() function #3

larger raw file for testing adjust_rub() function #3

Comments

epogrebnyak commented Jul 5, 2019 • edited Loading

epogrebnyak commented Aug 19, 2019

epogrebnyak commented Aug 21, 2019 • edited Loading

epogrebnyak commented Aug 21, 2019

epogrebnyak commented Aug 21, 2019

epogrebnyak commented Jul 5, 2019 •

edited

Loading

epogrebnyak commented Aug 21, 2019 •

edited

Loading