Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add FreeBSD support #172

Merged
merged 3 commits into from
Jun 2, 2024
Merged

add FreeBSD support #172

merged 3 commits into from
Jun 2, 2024

Conversation

cyyever
Copy link
Contributor

@cyyever cyyever commented Aug 7, 2023

No description provided.

@cyyever cyyever marked this pull request as draft August 7, 2023 14:14
@cyyever cyyever marked this pull request as ready for review August 7, 2023 14:18
src/freebsd/topology.c Outdated Show resolved Hide resolved
@cyyever cyyever force-pushed the freebsd branch 3 times, most recently from 19c60f5 to c57c80f Compare August 8, 2023 03:11
@cyyever cyyever mentioned this pull request Aug 8, 2023
@yurivict
Copy link

yurivict commented Aug 8, 2023

The testcase below fails with a "Floating point exception" with this patch on FreeBSD 13.2.

import torch
import math

dtype = torch.float
device = "cuda" if torch.cuda.is_available() else "cpu"
torch.set_default_device(device)

# Create Tensors to hold input and outputs.
# By default, requires_grad=False, which indicates that we do not need to
# compute gradients with respect to these Tensors during the backward pass.
x = torch.linspace(-math.pi, math.pi, 2000, dtype=dtype)
y = torch.sin(x)

# Create random Tensors for weights. For a third order polynomial, we need
# 4 weights: y = a + b x + c x^2 + d x^3
# Setting requires_grad=True indicates that we want to compute gradients with
# respect to these Tensors during the backward pass.
a = torch.randn((), dtype=dtype, requires_grad=True)
b = torch.randn((), dtype=dtype, requires_grad=True)
c = torch.randn((), dtype=dtype, requires_grad=True)
d = torch.randn((), dtype=dtype, requires_grad=True)

learning_rate = 1e-6
for t in range(2000):
    # Forward pass: compute predicted y using operations on Tensors.
    y_pred = a + b * x + c * x ** 2 + d * x ** 3

    # Compute and print loss using operations on Tensors.
    # Now loss is a Tensor of shape (1,)
    # loss.item() gets the scalar value held in the loss.
    loss = (y_pred - y).pow(2).sum()
    if t % 100 == 99:
        print(t, loss.item())

    # Use autograd to compute the backward pass. This call will compute the
    # gradient of loss with respect to all Tensors with requires_grad=True.
    # After this call a.grad, b.grad. c.grad and d.grad will be Tensors holding
    # the gradient of the loss with respect to a, b, c, d respectively.
    loss.backward()

    # Manually update weights using gradient descent. Wrap in torch.no_grad()
    # because weights have requires_grad=True, but we don't need to track this
    # in autograd.
    with torch.no_grad():
        a -= learning_rate * a.grad
        b -= learning_rate * b.grad
        c -= learning_rate * c.grad
        d -= learning_rate * d.grad

        # Manually zero the gradients after updating weights
        a.grad = None
        b.grad = None
        c.grad = None
        d.grad = None

print(f'Result: y = {a.item()} + {b.item()} x + {c.item()} x^2 + {d.item()} x^3')

@cyyever
Copy link
Contributor Author

cyyever commented Aug 8, 2023

@yurivict
My result is

python3 a.py
99 1283.42333984375
199 853.2401123046875
299 568.3171997070312
399 379.5828552246094
499 254.54953002929688
599 171.7064666748047
699 116.80989074707031
799 80.42716217041016
899 56.31061553955078
999 40.32223129272461
1099 29.720619201660156
1199 22.68964195251465
1299 18.02572250366211
1399 14.931407928466797
1499 12.87797737121582
1599 11.514902114868164
1699 10.609926223754883
1799 10.008904457092285
1899 9.609664916992188
1999 9.344354629516602
Result: y = 0.0073340958915650845 + 0.8354617357254028 x + -0.0012652541045099497 x^2 + -0.09030362218618393 x^3

Can you try to reproduce on main branch of pytorch with cpuinfo replaced with my version.
Some modifications of main are required, just try my fork https://github.com/cyyever/pytorch/tree/freebsd.
No floating point computations in this PR, just system calls. The error may be due to corrupted pytorch code, you can recompile everything from source and retry.

@yurivict
Copy link

yurivict commented Aug 8, 2023

I tried with the recent release 2.0.1

@cyyever
Copy link
Contributor Author

cyyever commented Aug 8, 2023

I tried with the recent release 2.0.1

Recompile with pytorch 2.0.1 should work, and debug your code with valgrind, paste the result here.
Replace cpuinfo from torch/third_party, make sure pytorch links with my version, and retry.
It is also possible that some bug in release 2.0.1 was fixed recently

@yurivict
Copy link

yurivict commented Aug 8, 2023

Here is what cgdb screen looks like at the moment of failure:

 79│                 packages[i].processor_start = i * threads_per_package;
 80│                 packages[i].processor_count = threads_per_package;
 81│                 packages[i].core_start = i * cores_per_package;
 82│                 packages[i].core_count = cores_per_package;
 83│                 packages[i].cluster_start = i;
 84│                 packages[i].cluster_count = 1;
 85│                 cpuinfo_x86_format_package_name(x86_processor.vendor, brand_string, packages[i].name);
 86│         }
 87│         for (uint32_t i = 0; i < freebsd_topology.cores; i++) {
 88│                 cores[i] = (struct cpuinfo_core) {
 89│                         .processor_start = i * threads_per_core,
 90│                         .processor_count = threads_per_core,
 91│                         .core_id = i % cores_per_package,
 92├───────────────────────> .cluster = clusters + i / cores_per_package,
 93│                         .package = packages + i / cores_per_package,
 94│                         .vendor = x86_processor.vendor,
 95│                         .uarch = x86_processor.uarch,
 96│                         .cpuid = x86_processor.cpuid,
 97│                 };
 98│         }
 99│         for (uint32_t i = 0; i < freebsd_topology.threads; i++) {
100│                 const uint32_t smt_id = i % threads_per_core;
101│                 const uint32_t core_id = i / threads_per_core;
102│                 const uint32_t package_id = i / threads_per_package;
103│
104│                 /* Reconstruct APIC IDs from topology components */
/disk-samsung/pytorch-work/pytorch-v2.0.1/third_party/cpuinfo/src/x86/freebsd/init.c                                                                                                           
New UI allocated
(gdb) r x2.py 
Starting program: /usr/local/bin/python3.9 x2.py
warning: File "/usr/local/lib/libpython3.9.so.1.0-gdb.py" auto-loading has been declined by your `auto-load safe-path' set to "$debugdir:$datadir/auto-load".
To enable execution of this file add
        add-auto-load-safe-path /usr/local/lib/libpython3.9.so.1.0-gdb.py
line to your configuration file "/home/yuri/.config/gdb/gdbinit".
To completely disable this security protection add
        set auto-load safe-path /
line to your configuration file "/home/yuri/.config/gdb/gdbinit".
For more information about this security protection see the
"Auto-loading safe path" section in the GDB manual.  E.g., run from the shell:
        info "(gdb)Auto-loading safe path"

Program received signal SIGFPE, Arithmetic exception.
Integer divide by zero.
0x0000000808f87fa4 in cpuinfo_x86_freebsd_init () at /disk-samsung/pytorch-work/pytorch-v2.0.1/third_party/cpuinfo/src/x86/freebsd/init.c:92
92                              .cluster = clusters + i / cores_per_package,
(gdb) p cores_per_package 
$1 = 0
(gdb) p &cores_per_package
Address requested for identifier "cores_per_package" which is in register $r12
(gdb) p clusters
$2 = (struct cpuinfo_cluster *) 0x853e416c0
(gdb) p i
$4 = 0
(gdb) p cores_per_package 
$5 = 0
(gdb) 

@yurivict
Copy link

yurivict commented Aug 8, 2023

cores_per_package is zero which is wrong.

@yurivict
Copy link

yurivict commented Aug 8, 2023

Unfortunately, freebsd_topology isn't printed by the debugger because it is somehow "optimized out".

@cyyever cyyever force-pushed the freebsd branch 4 times, most recently from 9c60525 to 38bd8cf Compare August 9, 2023 01:47
@cyyever
Copy link
Contributor Author

cyyever commented Aug 9, 2023

@yurivict I could reproduce the error in another host and it is fixed.

@cyyever cyyever force-pushed the freebsd branch 5 times, most recently from 6ffb02e to 7948b28 Compare August 9, 2023 03:13
Copy link
Contributor

@Maratyszcza Maratyszcza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reformat all files and line to use tabs

CMakeLists.txt Outdated Show resolved Hide resolved
CMakeLists.txt Outdated Show resolved Hide resolved
CMakeLists.txt Outdated Show resolved Hide resolved
src/x86/freebsd/init.c Outdated Show resolved Hide resolved
@yurivict
Copy link

yurivict commented Aug 9, 2023

Same Floating point exception with patches cb64777 and 7948b28.

@cyyever
Copy link
Contributor Author

cyyever commented Aug 9, 2023

Same Floating point exception with patches cb64777 and 7948b28.
Print the output of
sysctl kern.sched.topology_spec

@yurivict
Copy link

It's not what I have after 5 patches:

struct cpuinfo_freebsd_topology cpuinfo_freebsd_detect_topology(void) {
        int packages = cpuinfo_from_freebsd_sysctl("kern.smp.cpus");
        int cores = cpuinfo_from_freebsd_sysctl("kern.smp.cores");
        int threads_per_core = cpuinfo_from_freebsd_sysctl("kern.smp.threads_per_core");
        cpuinfo_log_debug("freebsd topology: packages = %d, cores = %d, threads_per_core = %d", packages, cores, threads_per_core);
        struct cpuinfo_freebsd_topology topology = {
                .packages = (uint32_t) packages,
                .cores = (uint32_t) cores,
                .threads_per_core = (uint32_t) threads_per_core,
                .threads = (uint32_t) (threads_per_core * cores)
        };

        return topology;
}

Maybe you need to squash the commits.

@cyyever
Copy link
Contributor Author

cyyever commented Aug 10, 2023

@yurivict The history is somewhat messy. Can you remove the local repository and re-clone? I would squash until it is stable

@yurivict
Copy link

Sorry, I had not all patches applied before.
The current set of patches works fine on my machine.

@cyyever
Copy link
Contributor Author

cyyever commented Aug 10, 2023

@yurivict Grad to see that. What is your CPU utility? Can all core reach 100%?

@yurivict
Copy link

This example prints that torch.get_num_threads() is 4 for some reason.

@cyyever
Copy link
Contributor Author

cyyever commented Aug 11, 2023

@yurivict

num_threads /= 2;   

in Pytorch c10/core/thread_pool.h line 42
so you have half of cores.
No comment about the divided by 2 logic, and cpuinfo is not used.

@yurivict
Copy link

Hi @cyyever

I updated the same PR so that it merges with the latest cpuinfo revision: #230

I verified that it works with OpenAI Whisper project, and many simple testcases.

Could you please merge it?

Thank you,
Yuri

@cyyever
Copy link
Contributor Author

cyyever commented Mar 30, 2024

@Maratyszcza Most of listed issues have been fixed except the cores, which is returned by FreeeBSD, can you help review again?

@Maratyszcza
Copy link
Contributor

I no longer maintain this project, maybe @malfet or @fbarchard could review?

@cyyever
Copy link
Contributor Author

cyyever commented May 20, 2024

@fbarchard Help merge it?

@cyyever
Copy link
Contributor Author

cyyever commented Jun 2, 2024

@malfet Help merge it?

Comment on lines +15 to +17
static inline uint32_t bit_mask(uint32_t bits) {
return (UINT32_C(1) << bits) - UINT32_C(1);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps as followup: as it's present in linux/init.c, mach/init.c and windows/init.c shouldn't it be moved to internal-api.h?

@malfet malfet merged commit 05332fd into pytorch:main Jun 2, 2024
1 check passed
@cyyever cyyever deleted the freebsd branch June 2, 2024 23:00
@amitdo
Copy link

amitdo commented Jul 6, 2024

The README.md file should mention FreeBSD support.

@cyyever
Copy link
Contributor Author

cyyever commented Jul 11, 2024

The README.md file should mention FreeBSD support.

The support is still experimental.

@amitdo
Copy link

amitdo commented Jul 11, 2024

The support is still experimental.

I still think you should publish it. Just mention that it is experimental.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants