Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

I am getting below exception while reading csv file and writing it to parquet file #547

Closed
pallav1991 opened this issue Jun 1, 2019 · 9 comments
Labels

Comments

@pallav1991
Copy link

pallav1991 commented Jun 1, 2019

Today After Upgrade when I ran my Program I am getting below exception.


{Py4JJavaError}An error occurred while calling o125.parquet.
: java.lang.NoSuchMethodError: org.apache.spark.sql.internal.SQLConf$.LEGACY_PASS_PARTITION_BY_AS_OPTIONS()Lorg/apache/spark/internal/config/ConfigEntry;
	at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:277)
	at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:271)
	at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:229)
	at org.apache.spark.sql.DataFrameWriter.parquet(DataFrameWriter.scala:566)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
	at py4j.Gateway.invoke(Gateway.java:282)
	at py4j.comm...


Can you tell me why is this exception occurring.

@issue-label-bot
Copy link

Issue-Label Bot is automatically applying the label bug to this issue, with a confidence of 0.59. Please mark this comment with 👍 or 👎 to give our bot feedback!

Links: app homepage, dashboard and code for this bot.

@issue-label-bot issue-label-bot bot added the bug label Jun 1, 2019
@argenisleon
Copy link
Collaborator

Can you put here the exactly code you are using?

@pallav1991
Copy link
Author

pallav1991 commented Jun 2, 2019

 if self.requestSubType == "createCSVToParquet":
                df = self.manager.optimus.load.csv(self.filePath)
            elif self.requestSubType == "createExcelToParquet":
                df = self.manager.optimus.load.excel(self.filePath, self.sheetName)
            if df is not None:
                df = convertColumnParquetFormat(df)
                df.write.mode("overwrite").parquet(self.haddopPath + self.fileName + '.parquet')
                return "success"
            return "failed"

@argenisleon
Copy link
Collaborator

argenisleon commented Jun 2, 2019

It seems like some kind of bug
Can you save the data successfully to another format like csv?
Can you try:

self.manager.optimus.save.parquet(self.haddopPath + self.fileName + '.parquet')

It almost the same as the native spark save plus a couple of checks

@argenisleon
Copy link
Collaborator

Hi @pallav1991, this works for you?

@rajib76
Copy link

rajib76 commented Jul 4, 2019

is this bug fixed? i am also getting this

@argenisleon
Copy link
Collaborator

argenisleon commented Jul 4, 2019

Hi @rajib76, are you getting the same error? the code @pallav1991 provide use
df.write.mode("overwrite").parquet(self.haddopPath + self.fileName + '.parquet') which is pyspark code not optimus code.

Did you try:
optimus.save.parquet(self.haddopPath + self.fileName + '.parquet')

If that not work can you show us your code?

@pallav1991
Copy link
Author

optimus.save

sorry @argenisleon for the late reply.
I have a question before i execute the above request.
in the above code where am i specifying the dataframe to write on the parquet file.

@argenisleon
Copy link
Collaborator

argenisleon commented Jul 8, 2019

My bad,

The correct code is:
df.save.parquet(self.haddopPath + self.fileName + '.parquet')

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants