You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It was found that tools.Tools.cleanCovid19Data() results in the majority of fields in the Province/State column in covid_19_data.csv being set to NULL.
"Grand Princess").when(col("region")===col("region")||col("region")==="None"||col("region").rlike("Unknown"), null)
.otherwise(trim(col("region"))))
On line 148, .when(col("region")===col("region") ... , null) evaluates to True for all cases, which leads to all values that do not get filtered out prior to this method call to be set to NULL. Those that would be filtered out prior to this are only "Diamond Princess" and "Grand Princess".
I also have visual confirmation by eyeballing the results of a .show()
The text was updated successfully, but these errors were encountered:
It was found that tools.Tools.cleanCovid19Data() results in the majority of fields in the Province/State column in covid_19_data.csv being set to NULL.
Refer to the code snippet:
COVID-19-SPARK/src/main/scala/tools/Tools.scala
Lines 146 to 149 in b34872d
On line 148,
.when(col("region")===col("region")
..., null)
evaluates to True for all cases, which leads to all values that do not get filtered out prior to this method call to be set to NULL. Those that would be filtered out prior to this are only "Diamond Princess" and "Grand Princess".I also have visual confirmation by eyeballing the results of a .show()
The text was updated successfully, but these errors were encountered: