-
-
Notifications
You must be signed in to change notification settings - Fork 19.4k
Description
Code Sample, a copy-pastable example if possible
#Sample residual pcts
residuals_pct =[0.044516001,0.031137117,1.06758E+64,0.003522454,0.065171486,0.06033751,0.01325514,-0.005620799,-0.006225719,0.045713825,0.039280786,0.000531307]
#Creating a dataframe with residual percent
d = pd.DataFrame(residuals, columns=['residual_pct'])
#Shifting the pct 1 row down.
d['residual_pct_shift'] = d.residual_pct.shift(1)
#Calculating the adjusted residual pct.
d['adj_residual_pct'] = d['residual_pct_shift'][3:].rolling(window=3).mean()Problem description
I am trying to implement adjusted residual percent as the rolling average of the previous three residual percents.
Due to some data issue, model forecasted the target variable in the scale of billions, consequently effecting the residual percentage calculation.
However, this is irrelevant to the current problem, the issue is when doing the rolling average using .....rolling(window=3).mean(), the average for the rows after the abnormal residual percent got affected.
If I do take those same numbers and do the normal way of averaging or use MS Excel, I got it right.
Output of pd.show_versions()
Details
INSTALLED VERSIONS ------------------ commit : None python : 3.7.3.final.0 python-bits : 64 OS : Windows OS-release : 10 machine : AMD64 processor : Intel64 Family 6 Model 142 Stepping 9, GenuineIntel byteorder : little LC_ALL : None LANG : None LOCALE : None.Nonepandas : 0.25.1
numpy : 1.16.4
pytz : 2019.2
dateutil : 2.8.0
pip : 19.2.2
setuptools : 41.0.1
Cython : 0.29.13
pytest : 5.0.1
hypothesis : None
sphinx : 2.1.2
blosc : None
feather : None
xlsxwriter : 1.1.8
lxml.etree : 4.4.1
html5lib : 1.0.1
pymysql : None
psycopg2 : None
jinja2 : 2.10.1
IPython : 7.8.0
pandas_datareader: None
bs4 : 4.8.0
bottleneck : 1.2.1
fastparquet : None
gcsfs : None
lxml.etree : 4.4.1
matplotlib : 3.1.1
numexpr : 2.7.0
odfpy : None
openpyxl : 2.6.2
pandas_gbq : None
pyarrow : None
pytables : None
s3fs : None
scipy : 1.3.1
sqlalchemy : 1.3.7
tables : 3.5.2
xarray : None
xlrd : 1.2.0
xlwt : 1.3.0
xlsxwriter : 1.1.8
Is there anything that I am missing or unaware. I tried to check in the pandas' source code and found nothing. Irrespective of the scale of the numbers, excel and the normal way of averaging works fine.
Please advise.
I greatly appreciate your help.
Thank you.