You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Following the example in: https://github.com/rapidsai-community/notebooks-contrib/blob/main/community_tutorials_and_guides/census_education2income_demo.ipynb
I have a laptop with 11th Gen Intel(R) Core(TM) i7-1185G7 @ 3.00GHz, 64 GB RAM. In addition to the Intel TigerLake-LP GT2 [Iris Xe Graphics], there is an Nvidia GPU as follows:
3D controller TU117GLM [Quadro T500 Mobile]
vendor: NVIDIA Corporation
physical id: 0
bus info: pci@0000:01:00.0
version: a1
width: 64 bits
clock: 33MHz
capabilities: pm msi pciexpress bus_master cap_list rom
configuration: driver=nvidia latency=0
when I create a cluster I get only one worker, and when I compute anything I see in the dashboard the CPU only working:
from dask_cuda import LocalCUDACluster
from dask.distributed import Client
import dask.dataframe as dd
cluster = LocalCUDACluster(memory_limit='30GB', device_memory_limit='1.5GB', local_directory='./cache', \
threads_per_worker=8, rmm_pool_size="1.5GB", rmm_async=True, rmm_log_directory='./log')
client = Client(cluster)
I am trying to read the backblaze 2022/2023 dataset. These are 730 csv files, on disk totaling to 62.6 GB using this code:
import dask.dataframe as dd
import os
data_dir = './data/backblaze/'
# This will create a dask data frame with a partition for every one of the 730 files.
df = dd.read_csv(data_dir+'*.csv', dtype=dtypes)
with one CPU worker and 8 threads, I am having impossible bottlenecks to do any computation, such as count_values takes a few hours:
min and max of columns took a complete day to finish one column, and still is going for others:
for cur_col in col_list:
if check_str in cur_col:
cur_min = df[cur_col].min().compute()
cur_max = df[cur_col].max().compute()
#if not math.isnan(cur_min):
if cur_min != cur_max :
print(" {:20s} {:15d} {:15d} ".format(cur_col,int(cur_min),int(cur_max)))
loc_col_list.append(cur_col)
I need advise on how to get the GPU cores working to speed up the processing. I also need an advise on purchasing the cheapest option for home GPU cluster using something around these options and in this price range:
I tried the same code on V100 GPU on Google Colab, and it is still not using the GPU, and extremely slow. Still running on laptop since last week now, and a few hours ago on Google Colab saying clearly that the GPU is not used, and I should switch to standard runtime. Can you please advise how I can use cuda data frame to read 62GB dataset and train RAPIDs algorithms on it,
okay, so first, your GPU is far too small - it has only 1.5GB of usable GPU memory (probably the 2GB variant of the T500). This notebook was meant to run on a 32GB or larger GPU. In fact, we recommend that you have a 16GB GPU to run our examples, however, we try to make accommodations for the 11GB x080s.
Hi
Following the example in: https://github.com/rapidsai-community/notebooks-contrib/blob/main/community_tutorials_and_guides/census_education2income_demo.ipynb
I have a laptop with 11th Gen Intel(R) Core(TM) i7-1185G7 @ 3.00GHz, 64 GB RAM. In addition to the Intel TigerLake-LP GT2 [Iris Xe Graphics], there is an Nvidia GPU as follows:
3D controller TU117GLM [Quadro T500 Mobile]
vendor: NVIDIA Corporation
physical id: 0
bus info: pci@0000:01:00.0
version: a1
width: 64 bits
clock: 33MHz
capabilities: pm msi pciexpress bus_master cap_list rom
configuration: driver=nvidia latency=0
when I create a cluster I get only one worker, and when I compute anything I see in the dashboard the CPU only working:
I am trying to read the backblaze 2022/2023 dataset. These are 730 csv files, on disk totaling to 62.6 GB using this code:
with one CPU worker and 8 threads, I am having impossible bottlenecks to do any computation, such as count_values takes a few hours:
counts = df.model.value_counts(dropna=False).compute()
min and max of columns took a complete day to finish one column, and still is going for others:
I need advise on how to get the GPU cores working to speed up the processing. I also need an advise on purchasing the cheapest option for home GPU cluster using something around these options and in this price range:
External PCI-E chassis to connect to my laptop (although this one does not seem suitable to NVIDIA GPUs, please advise):
https://www.amazon.co.uk/gp/product/B0BJL7HKD8/ref=ox_sc_act_image_2?smid=A3P5ROKL5A1OLE&psc=1
and GPUs such as (or advise on best fastest value for money alternatives):
https://www.amazon.co.uk/gp/product/B0C8ZQTRD7/ref=ox_sc_act_image_1?smid=A20CAXEAQ9QIMK&psc=1
Thank you very much in advance,
Manal
The text was updated successfully, but these errors were encountered: