Error happens when I try to load data #359

andrewzhang1 · 2017-06-08T01:07:35Z

As a <user type>, I want to <task> so that <goal> (make this the title)

Expected behavior

Actual behavior

Steps to reproduce the behavior

I used jupyter notebook with anaconda python 2.7 on my local PC to do a project with pixiedust. Here are the errors when I tried to load data frame:

!pip install pixiedust
import pixiedust
df = pixiedust.sampleData("https://data.sfgov.org/api/views/vv57-2fgy/rows.csv?accessType=DOWNLOAD")

Downloading 'https://data.sfgov.org/api/views/vv57-2fgy/rows.csv?accessType=DOWNLOAD' from https://data.sfgov.org/api/views/vv57-2fgy/rows.csv?accessType=DOWNLOAD
Downloaded 214106 of 214106 bytes
Creating pySpark DataFrame for 'https://data.sfgov.org/api/views/vv57-2fgy/rows.csv?accessType=DOWNLOAD'. Please wait...
Successfully created pySpark DataFrame for 'https://data.sfgov.org/api/views/vv57-2fgy/rows.csv?accessType=DOWNLOAD'

AttributeError Traceback (most recent call last)
in ()
----> 1 df = pixiedust.sampleData("https://data.sfgov.org/api/views/vv57-2fgy/rows.csv?accessType=DOWNLOAD")

/home/azhang/anaconda/lib/python2.7/site-packages/pixiedust/utils/environment.pyc in wrapper(*args, **kwargs)
83 kwargs.pop("fromScala")
84 fromScala = True
---> 85 retValue = func(*args, **kwargs)
86 if fromScala and retValue is not None:
87 from pixiedust.utils.javaBridge import JavaWrapper

/home/azhang/anaconda/lib/python2.7/site-packages/pixiedust/utils/sampleData.pyc in sampleData(dataId)
78 def sampleData(dataId=None):
79 global dataDefs
---> 80 return SampleData(dataDefs).sampleData(dataId)
81
82 class SampleData(object):

/home/azhang/anaconda/lib/python2.7/site-packages/pixiedust/utils/sampleData.pyc in sampleData(self, dataId)
91 return self.loadSparkDataFrameFromSampleData(dataDefs[str(dataId)])
92 elif "https://" in str(dataId) or "http://" in str(dataId) or "file://" in str(dataId):
---> 93 return self.loadSparkDataFrameFromUrl(str(dataId))
94 else:
95 print("Unknown sample data identifier. Please choose an id from the list below")

/home/azhang/anaconda/lib/python2.7/site-packages/pixiedust/utils/sampleData.pyc in loadSparkDataFrameFromUrl(self, dataUrl)
126 "url": dataUrl
127 }
--> 128 return Downloader(dataDef).download(self.dataLoader)
129
130

/home/azhang/anaconda/lib/python2.7/site-packages/pixiedust/utils/sampleData.pyc in download(self, dataLoader)
150 try:
151 print("Creating pySpark DataFrame for '{0}'. Please wait...".format(displayName))
--> 152 return dataLoader(path, self.dataDef.get("schema", None))
153 finally:
154 print("Successfully created pySpark DataFrame for '{0}'".format(displayName))

/home/azhang/anaconda/lib/python2.7/site-packages/pixiedust/utils/sampleData.pyc in dataLoader(self, path, schema)
103 def dataLoader(self, path, schema=None):
104 #TODO: if in Spark 2.0 or higher, use new API to load CSV
--> 105 load = ShellAccess["sqlContext"].read.format('com.databricks.spark.csv')
106 if schema is not None:
107 def getType(t):

AttributeError: 'NoneType' object has no attribute 'read'

The text was updated successfully, but these errors were encountered:

isc-rsingh · 2017-06-08T16:22:08Z

@andrewzhang1 I cannot reproduce. This works for me with Python 2.7.11 and Spark 2.0.2 with Pixiedust 1.0.6.

import pixiedust
sftest = pixiedust.sampleData("https://data.sfgov.org/api/views/vv57-2fgy/rows.csv?accessType=DOWNLOAD")

andrewzhang1 · 2017-06-08T22:47:32Z

@rajrsingh I attended David Taieb's demo yesterday, and I did not use IBM's Data science cloud environment. By instead, I use my own local jupyter on my linux. I showed this error to David, and David told me there're might be a bug there. I like to confirm: no spark needs to be installed (or connected) on my local machines in order use pixiedust.

andrewzhang1 · 2017-06-08T22:53:46Z

By the way, I'm not the only one getting this error by using local jupyter notebook.

DTAIEB · 2017-06-09T01:47:25Z

@andrewzhang1 Yes, this is a valid bug that happens when running PixieDust on a plain Python Notebook without Spark.

update sampleData() api to successfully load data when notebook is not using spark

vabarbosa · 2017-07-11T19:10:05Z

sampleData() has been updated in commit 90937ef (PR #404) to successful load data into a pandas dataframe when in a notebook w/o Spark

DTAIEB self-assigned this Jun 9, 2017

DTAIEB added this to the 1.0.7 milestone Jun 9, 2017

DTAIEB modified the milestones: 1.0.8, 1.0.7 Jul 2, 2017

vabarbosa self-assigned this Jul 11, 2017

vabarbosa added the bug label Jul 11, 2017

vabarbosa added a commit that referenced this issue Jul 11, 2017

#359 - error loading data in notebook without spark

90937ef

update sampleData() api to successfully load data when notebook is not using spark

DTAIEB closed this as completed Jul 16, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error happens when I try to load data #359

Error happens when I try to load data #359

andrewzhang1 commented Jun 8, 2017

isc-rsingh commented Jun 8, 2017

andrewzhang1 commented Jun 8, 2017

andrewzhang1 commented Jun 8, 2017

DTAIEB commented Jun 9, 2017

vabarbosa commented Jul 11, 2017

Error happens when I try to load data #359

Error happens when I try to load data #359

Comments

andrewzhang1 commented Jun 8, 2017

Expected behavior

Actual behavior

Steps to reproduce the behavior

isc-rsingh commented Jun 8, 2017

andrewzhang1 commented Jun 8, 2017

andrewzhang1 commented Jun 8, 2017

DTAIEB commented Jun 9, 2017

vabarbosa commented Jul 11, 2017