Skip to content

When we are working large files with pandas library we can suffer from memory errors or slow processing as Pandas is a very powerful tool but very memory consuming in terms of RAM. On this git I present a simple way to reduce the memory overload of pandas dataframes using pandas formatting and some transformations.

License

Notifications You must be signed in to change notification settings

marcroiglama/pandas_engineering

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 

Repository files navigation

pandas_engineering

When we are working large files with pandas library we can suffer from memory errors or slow processing. Pandas is a very powerful tool but very memory consuming in terms of RAM if we don't preprocess a bit the original dataframe. On this git I present a simple way to reduce the memory overload of dataframes using pandas functions and tools.

Given a dataframe of 1684.11 MB the memory overload is reduce it untill 128 MB! Find the stop_times.txt file on: https://transitfeeds.com/p/helsinki-regional-transport/735/20190111

About

When we are working large files with pandas library we can suffer from memory errors or slow processing as Pandas is a very powerful tool but very memory consuming in terms of RAM. On this git I present a simple way to reduce the memory overload of pandas dataframes using pandas formatting and some transformations.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published