-
-
Notifications
You must be signed in to change notification settings - Fork 18.8k
Description
This is a reopening of #1836. The suggestion there was to add a parameter to pd.merge
, such as fillvalue
, whose value would be used instead of NaN
for missing values. This isn't simply solved by fillna
since adding NaN
to columns casts them to float
.
#1836 also asked to provide an example where this would be useful. Admittedly, in my case there might be a simpler solution than merge
, but anyway.
I have a DataFrame
with a single column which is basically an index: it contains distinct numbers. I also have a DataFrame
where one column contains some (but not all) values from the same index, while others contain useful data. I want to extend this DataFrame
to include all values from the index, filling the other columns with zeros. I do this by calling
pd.merge(df_with_index, smaller_df_with_data, on='col_index', how='outer').fillna(0)
and end up with a DataFrame
where all columns except for col_index
are cast to float
.
Output of pd.show_versions()
INSTALLED VERSIONS
------------------
commit: None
python: 3.6.4.final.0
python-bits: 64
OS: Linux
OS-release: 4.14.23-1-MANJARO
machine: x86_64
processor:
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
pandas: 0.22.0
pytest: None
pip: None
setuptools: 38.5.1
Cython: 0.27.3
numpy: 1.14.0
scipy: 1.0.0
pyarrow: None
xarray: None
IPython: 6.2.1
sphinx: None
patsy: None
dateutil: 2.6.1
pytz: 2018.3
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 2.1.2
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 1.0.1
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None