After cleaning and getting a sample from the original dataset, it's possible to predict, with an
accuracy of 71.74%, if the tip of a trip in a NYC taxi it's going to be
less than 20% or
greater than or equal to 20% of the charge, without the possibility to use information about the passengers, a essential data for trying to accomplish this task.
For read an extended version there are some IPython
notebooks that describe the complete process. You can find them in this repo, but for a better reading use this