Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error happens when I try to load data #359

Closed
andrewzhang1 opened this issue Jun 8, 2017 · 5 comments
Closed

Error happens when I try to load data #359

andrewzhang1 opened this issue Jun 8, 2017 · 5 comments
Assignees
Labels
Milestone

Comments

@andrewzhang1
Copy link

As a <user type>, I want to <task> so that <goal> (make this the title)

Expected behavior

Actual behavior

Steps to reproduce the behavior

I used jupyter notebook with anaconda python 2.7 on my local PC to do a project with pixiedust. Here are the errors when I tried to load data frame:

!pip install pixiedust
import pixiedust
df = pixiedust.sampleData("https://data.sfgov.org/api/views/vv57-2fgy/rows.csv?accessType=DOWNLOAD")

Downloading 'https://data.sfgov.org/api/views/vv57-2fgy/rows.csv?accessType=DOWNLOAD' from https://data.sfgov.org/api/views/vv57-2fgy/rows.csv?accessType=DOWNLOAD
Downloaded 214106 of 214106 bytes
Creating pySpark DataFrame for 'https://data.sfgov.org/api/views/vv57-2fgy/rows.csv?accessType=DOWNLOAD'. Please wait...
Successfully created pySpark DataFrame for 'https://data.sfgov.org/api/views/vv57-2fgy/rows.csv?accessType=DOWNLOAD'

AttributeError Traceback (most recent call last)
in ()
----> 1 df = pixiedust.sampleData("https://data.sfgov.org/api/views/vv57-2fgy/rows.csv?accessType=DOWNLOAD")

/home/azhang/anaconda/lib/python2.7/site-packages/pixiedust/utils/environment.pyc in wrapper(*args, **kwargs)
83 kwargs.pop("fromScala")
84 fromScala = True
---> 85 retValue = func(*args, **kwargs)
86 if fromScala and retValue is not None:
87 from pixiedust.utils.javaBridge import JavaWrapper

/home/azhang/anaconda/lib/python2.7/site-packages/pixiedust/utils/sampleData.pyc in sampleData(dataId)
78 def sampleData(dataId=None):
79 global dataDefs
---> 80 return SampleData(dataDefs).sampleData(dataId)
81
82 class SampleData(object):

/home/azhang/anaconda/lib/python2.7/site-packages/pixiedust/utils/sampleData.pyc in sampleData(self, dataId)
91 return self.loadSparkDataFrameFromSampleData(dataDefs[str(dataId)])
92 elif "https://" in str(dataId) or "http://" in str(dataId) or "file://" in str(dataId):
---> 93 return self.loadSparkDataFrameFromUrl(str(dataId))
94 else:
95 print("Unknown sample data identifier. Please choose an id from the list below")

/home/azhang/anaconda/lib/python2.7/site-packages/pixiedust/utils/sampleData.pyc in loadSparkDataFrameFromUrl(self, dataUrl)
126 "url": dataUrl
127 }
--> 128 return Downloader(dataDef).download(self.dataLoader)
129
130

/home/azhang/anaconda/lib/python2.7/site-packages/pixiedust/utils/sampleData.pyc in download(self, dataLoader)
150 try:
151 print("Creating pySpark DataFrame for '{0}'. Please wait...".format(displayName))
--> 152 return dataLoader(path, self.dataDef.get("schema", None))
153 finally:
154 print("Successfully created pySpark DataFrame for '{0}'".format(displayName))

/home/azhang/anaconda/lib/python2.7/site-packages/pixiedust/utils/sampleData.pyc in dataLoader(self, path, schema)
103 def dataLoader(self, path, schema=None):
104 #TODO: if in Spark 2.0 or higher, use new API to load CSV
--> 105 load = ShellAccess["sqlContext"].read.format('com.databricks.spark.csv')
106 if schema is not None:
107 def getType(t):

AttributeError: 'NoneType' object has no attribute 'read'

@isc-rsingh
Copy link
Member

@andrewzhang1 I cannot reproduce. This works for me with Python 2.7.11 and Spark 2.0.2 with Pixiedust 1.0.6.

import pixiedust
sftest = pixiedust.sampleData("https://data.sfgov.org/api/views/vv57-2fgy/rows.csv?accessType=DOWNLOAD")

@andrewzhang1
Copy link
Author

@rajrsingh I attended David Taieb's demo yesterday, and I did not use IBM's Data science cloud environment. By instead, I use my own local jupyter on my linux. I showed this error to David, and David told me there're might be a bug there. I like to confirm: no spark needs to be installed (or connected) on my local machines in order use pixiedust.

@andrewzhang1
Copy link
Author

By the way, I'm not the only one getting this error by using local jupyter notebook.

@DTAIEB
Copy link
Member

DTAIEB commented Jun 9, 2017

@andrewzhang1 Yes, this is a valid bug that happens when running PixieDust on a plain Python Notebook without Spark.

@DTAIEB DTAIEB self-assigned this Jun 9, 2017
@DTAIEB DTAIEB added this to the 1.0.7 milestone Jun 9, 2017
@DTAIEB DTAIEB modified the milestones: 1.0.8, 1.0.7 Jul 2, 2017
@vabarbosa vabarbosa self-assigned this Jul 11, 2017
@vabarbosa vabarbosa added the bug label Jul 11, 2017
vabarbosa added a commit that referenced this issue Jul 11, 2017
update sampleData() api to successfully load data when notebook is not
using spark
@vabarbosa
Copy link
Member

sampleData() has been updated in commit 90937ef (PR #404) to successful load data into a pandas dataframe when in a notebook w/o Spark

@DTAIEB DTAIEB closed this as completed Jul 16, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants